DESCRIPTION Students: Please keep in mind the OMSI rules. Save your files often, make sure OMSI fills your entire screen at all times, etc. Remember that clicking CopyQtoA will copy the entire question box to the answer box. In questions involving code which will PARTIALLY be given to you in the question specs, you may need add new lines. There may not be information given as to where the lines should be inserted. MAKE SURE TO RUN THE CODE IN PROBLEMS THAT INVOLVE CODE! QUESTION (Text answer, 20 points.) It was noted that NA values are so important that it may be desirable to have more than one kind of NA. What has been done toward this goal? QUESTION -ext .R -run 'Rscript ./omsi_answer2.R' (R code answer, 20 points.) Suppose the matrix A is partitioned into two sets of rows. Call the upper rows B and the lower rows C. It will necessarily be that rk(A) <= rk(B) + rk(C). Give an example for B and C such that that inequality is strict, i.e. <, and one in which it is an equality. # fill in code here print(B) print(C) # fill in code here print(B) print(C) QUESTION -ext .R -run 'Rscript ./omsi_answer3.R' (R code answer, 20 points.) Write a function to find the proportion of users in MovieLens who rated at least k movies. Print the one for k = 88. propRated <- function(k) { load('Hwk1.RData') } print(propRated(88)) QUESTION (Text answer, 20 points.) The ratings of movies will often be correlated; people who like movie X may tend to like movie Y. Say we compute the correlations between all pairs of movies, i.e. call cor() on all pairs of columns of the ratings matrix A. (We first would need to find all users who have rated both X and Y. Don't worry about this point.) We could then do some analysis on the "winner," i.e. the pair of columns that are most highly correlated. What would be a concern about doing this, from our book? QUESTION -ext .R -run 'Rscript ./omsi_answer5.R' (R code answer, 20 points.) Write a function that will find the proportion of movies that are in all genres specified in the argument. For instance, say we are interested in G5 and G8. If a movie is in both of these genres, it counts, but not if it is in only one or none of them. The argument is a vector of column names. inAllGenres <- function(whichGenres) { load('Hwk1.RData') } print(inAllGenres(c('G2','G5','G18')))