Homework III
Due Thursday, November 15
Note the semi-open ended nature of the questions. This is a warmup to
the Term Project in that sense.
Please make sure you are using Version 1.0.4 of rectools.
Problem A
This problem will involve the
Czech dating agency data:
- Whose ratings are more predictable, men's or women's? Do an
empirical investigation.
- Separate the data into male and female user subsets.
- Place your code in a file ProbA.R and your analysis in
ProbA.tex. As usual, submit those files plus your .pdf,
as well as figure files if any.
Problem B
Let Ni denote the number of ratings by User i.
Typically in RS applications, covariates may not help prediction if most
Ni are reasonably large; in such a situation, the user's
ratings will tell us all we need to know about the person, so that the
covariates become superfluous.
However, the covariates may help in predicting the cases with small
Ni. In this problem you will investigate this.
- Use the InstEval data. This is a good testbed, as it has a lot of
small Ni. The covariates are then the material in columsn 4
through 19.
- Use kNN, with k set to a value you found to work well in Homework I.
- Separate all users having Ni less than or equal to some
threshold m. Train the model (i.e. run formUserData()) on the
rest of the data, then predict this small-Ni set, using MAPE
as the accuracy criterion. Do this both with and without covariates.
- As a check: I found that covariates gave a substantial improvement
in the case m = 5, using k = 5.
- Place your code in a file ProbB.R and your analysis in
ProbB.tex. As usual, submit those files plus your .pdf,
as well as figure files if any.