xq: (Text answer, 20 points) Say we are converting a categorical variable of k categories to dummies, for use as predictors/features in predicting some variable. We will create k-1 dummies. We must not create k of them. Why not? (Note: "Must not" is stronger than "need not.") answer: If we use all k categories, the A matrix would be of less than full rank, and A'A would be singular. xq: (R code answer, points) Consider the mlb dataset from regtools. We will predict Weight from Position, Height and Age. Assume we have already run factorsToDummies to convert the Position column. Suppose we were to run a quadratic model, adding columns "by hand" instead of using qePolyLin(). How many columns would we add? Your answer must be an executable R expression, but you may wish to write your reasoning in # comment lines. answer: # 7 dummies; 2 squared terms; 7x2 interaction terms 2 + 7*2 xq: (Text answer, 20 points) The qeLogit() function uses the One vs. All approach, but All vs. All may be faster on the verterbrae dataset. Explain. answer: The glm() function will be run 3 times in either case, but AvA will work on smaller datasets. xq: (R code answer, 20 points) The qe*() functions have a uniform call interface, and a uniform structure in the return object, though additional model-specific entities may be present in both cases. In predicting some variable Y from a set of features X, the return objects will include the accuracy measures testAcc and baseAcc. These are the mean absolute prediction error in the case of predicting continuous Y, and the overall probability of misclassification for classification problems. One might also calculate trainAcc, the accuracy obtained in predicting the training, i.e. non-holdout data. Using classif and holdIdxs, write a function that calculates this latter accuracy. The call form will be getTrainAcc(qeObj,xdata,ydata) where qeObj is the return object, ydata is the Y column in the original data and xdata is the feature columns. getTrainAcc <- function(qeObj,xdata,ydata) { } set.seed(9999) data(mlb) data <- mlb[,c(3:6)] z <- qePolyLog(data,'Position',deg=4) print(z$testAcc) print(getTrainAcc(z, data[,-1], data[,1])) answer: getTrainAcc <- function(qeObj,xdata,ydata) { xdata <- xdata[-(qeObj$holdIdxs),] ydata <- ydata[-(qeObj$holdIdxs)] preds <- predict(qeObj,xdata) if (!qeObj$classif) { return(mean(abs(ydata - preds))) } return(mean(preds$predClasses == ydata)) } xq: (R code answer, 20 points) Write a function with call form lass(xdata,ydata,b,lam) that will evaluate Equation (5.26) at the beginning of Sec. 5.10.2. lass <- function(xdata,ydata,b,lam) { xdata <- as.matrix(xdata) b <- matrix(b,ncol=1) } data(mlb) mlb <- mlb[,4:6] # Height, Weight and Age only, e.g. no Position lass(mlb[,-2],mlb[,2],c(-100,5.0,1.1),1.5) answer: lass <- function(xdata,ydata,b,lam) { xdata <- as.matrix(xdata) b <- matrix(b,ncol=1) A <- cbind(1,xdata) D <- ydata tmp <- D - A %*% b t(tmp) %*% tmp + lam * sum(abs(b)) }