Homework II

Due Thursday, November 1.

Problem A

In this problem, you will use matrix factorization methods. Here rectools provides wrappers to the fast and powerful recosystem package. We'll use the MovieLens data (original 100K version), without covariates.

Details:

  1. Use Nonnegative Matrix Factorization (NMF), fitting with trainReco() and predicting with the provided method for generic predict().
  2. Split your data into training and test sets, as in Hwk I. Follow those instructions exactly, except with MovieLens instead of InstEval.
  3. For a range of values of the rank r, fit NMF on the training set and predict on the test set, calculating MAPE. Graph MAPE against r.
  4. Repeat part (c), but in this case predict on the training set. Graph, plotting this curve on the same graph as (c), i.e. two curves on one graph.
  5. Write about your results -- prose and graphs -- in ProblemA.tex.
  6. Place your code in a file ProblemA.R that you will include in your submission. Your .tex image files and the resulting .pdf file will also be in your submission.

Problem B

When working with RS data, one must be very careful with user and item IDs, as they may not be consecutive. This can be disastrous if not accounted for.

Problem C

Here you will write a function, intended for use with the small MovieLens data, that determines, for each user, her favorite genre. Here are the details: