Second Order Collaborative Filtering: Playing with latent feature dimension

08 Sep 2015

So, after playing around with things, I find -- unsurprisingly, since the winner of the Netflix contest used a very high-dimensional representation of feature vectors xRD, D1000 -- that increasing the dimension of the feature vectors improves training fit substantially. Even with a fairly high regularization parameter of λ=1 from the last post, I get the following results for D=200:
[](http://2.bp.blogspot.com/-F8wJIz0Ay_s/Ve79h0oHcFI/AAAAAAAACP8/7FuWoJ1-890/s1600/Screenshot%2B2015-09-07%2B19.17.47.png)As you can see, we get a much tighter regression fit on the given ratings matrix, Yij at the cost of extra computation. Inverting the Hessian of the Cost function -- which turns out to be only D×D, thank goodness, due to diagonality in other degrees of freedom -- takes a great deal of time for high dimension D, so we are left with a trade off between goodness of fit and computation time. This algorithm has been a second order "batch" gradient descent, taking in all the data at once. It will be interesting to see how things can be made incremental, or "online", so that data is taken in bit by bit, and our matrices Xil, θjl are updated bit by bit.