Second Order Collaborative Filtering: Playing with latent feature dimension
So, after playing around
with things, I find -- unsurprisingly, since the winner of the Netflix contest
used a very high-dimensional representation of feature vectors →x∈RD, D≈1000 -- that increasing the dimension of the
feature vectors improves training fit substantially. Even with a fairly high
regularization parameter of λ=1 from the last post, I get the
following results for D=200:
[
](http://2.bp.blogspot.com/-F8wJIz0Ay_s/Ve79h0oHcFI/AAAAAAAACP8/7FuWoJ1-890/s1600/Screenshot%2B2015-09-07%2B19.17.47.png)As
you can see, we get a much tighter regression fit on the given ratings matrix,
Yij at the cost of extra computation. Inverting the Hessian of the Cost
function -- which turns out to be only D×D, thank goodness, due to
diagonality in other degrees of freedom -- takes a great deal of time for high
dimension D, so we are left with a trade off between goodness of fit and
computation time.
This algorithm has been a second order "batch" gradient descent, taking in all
the data at once. It will be interesting to see how things can be made
incremental, or "online", so that data is taken in bit by bit, and our
matrices Xil, θjl are updated bit by bit.
