# Blog, ECS 132, Winter 2021

Saturday, March 27, 1:55 pm

I've been asked what courses I recommend for students who wish to go further into the subject matter of our course. I have suggestions below.

My most recommended choices would be, in this priority order, ECS 171 or STA 142AB;, ECS 172; STA 137; MAT 135B, 167.

• CS Dept.

ECS 171: Machine Learning. Mildly mathematical.

ECS 172: Recommender Systems. My new course, to be offered Winter 2022.

• Stat Dept. (not "stats")

STA 108: Regression Analysis. Largely our Chapter 14. No matrices or calculus, but more detail for the linear case. Watch out for hypothesis testing!

STA 130B: Math Stat, Brief Course. Theory. Slightly lower math level than our course.

STA 131BC: Math Stat. Theory. More math than our course.

STA 135, 137, 138, 144: Various stat methods, rather specialized. I'd recommend 137.

STA 141A: NOT RECOMMENDED. You already have this background.

STA 141BC: Worthy, though rather unchallenging for CS majors.

STA 142AB: Strongly recommended.

• Math Dept.

MAT 135B: Stochastic Processes. Largely on Markov chains, which unfortunately we had to skip in our course. Expect some math here, limits, simple proofs.

MAT 167: Not a stat/ML course, but essential if you want to pursue the stat/ML field.

Saturday, March 27, 9:55 am

I just turned in the grades for our class! Sorry for the delay, mainly due to time needed to grade the Term Projects, and an emergency need to modify my grading scripts.

I don't know how long it will take for the grades to show up on your own records. You are welcome to e-mail me, asking for your grade.

At the start of the quarter, I stressed two points:

• I recommended that you NOT take the course P/NP. If you take the class seriously and put in the work, you will almost certainly get an A or B grade. If you don't put in the work, you'll likely get a D or F.
• I emphasized that the Term Project IS the class. That's where you will develop the lasting learning you get from the class. Your course grade will reflect this.

In addition to its formal role in the course grade, the Project gave a bump to your grade as follows: Your course grade from the formula (70% Quizzes, 30% Homework/Project) was raised 1, 2 or 3 notches, for a Term Project grade of B+, A- and A, respectively.

M M C- C- B- F B+ C+ B-


(M means Missing) but ended up with an A grade for the course! After dropping the lowest 3 Quizzes but adding Homework/Project, this student had an overall average of 2.88. That was just barely a B, but with an A project, the student's grade was raised 3 notches, resulting in an A for the course. This is similar to the sample from Fall 2019 that I gave in our Syllabus.

The distribution of course grades was

> grds <- ucd$GradeCode > table(grds) grds A A+ A- B B+ B- C C+ C- D F I 29 4 5 17 12 4 3 7 1 1 4 3  Please note this official UCD rule: A grade can be changed only if a "clerical or procedural error" can be documented. No change of grade may be made on the basis of reassessment of the quality of a student's work. I hope you're having a nice break. Wednesday, March 17, 4:20 pm I am still answering questions by e-mail today, though I am available by e-mail only intermittently. Here is an example of using tar: fandrhome/matloff/public_html/matloff/public_html/132/OldExams/tmp/z laura:tmp/z% ls a b laura:tmp/z% tar cf newtar.tar a b laura:tmp/z% ls a b newtar.tar laura:tmp/z% tar tf newtar.tar a b  Here I was in a directory z, with files a and b. I used tar from within z to produce newtar.tar, consisting of just a and b. To check, I ran tar tf. Tuesday, March 16, 3:25 pm I've received a number of requests to be allowed to use the geospheree package in calculating distances. It's fine. But I just wanted to make sure everyone realizes that it's a trivial geometric calculation that you can do yourself. Monday, March 15, 5:15 pm In Part B, in the section on linear models, I want you to try an ordinary linear model first. Then you can try polynomial models. You are required only to form a confidence interval for your ordinary linear model, not polynomial ones. Note, though, that you COULD form CIs for the polynomial models too, because they are still linear models, i.e. linear in β, on which the CI is based. Sunday, March 14, 7:15 pm Happy Pi Day. :-) Someone asked me about including code listings in your report. Should you include by copy-and-paste? What LaTeX macros could you use for making it look nice? Definitely don't do copy-and-paste! We're CS people, after all. :-) Your text editor should have a feature to import files. In vi/vim, for instance, type :r your_file_name  The contents of the specified file will then be read into your current buffer, at the point at which you invoked the command. There are many ways to display code in LaTeX, some quite fancy. I used the listings package. Saturday, March 13, 10:45 am Later today I will be e-mailing you summaries of your quiz grades. Make sure to double-check their veracity. Recall that for Section A02, Quiz 2 counts double, as Quiz 1 had to be canceled at the last minute. So you will see a "Quiz 1" score which is identical to that of Quiz 2. Similarly, Quiz 3 counts double for Section A01, due to lack of Quiz 2, etc. Finally, Quiz 4, the oral quiz, also counts double, so you will see two identical grades. Also: • Recall that your lowest THREE quiz letter grades will be dropped. If you did poorly on one of the doubled quizzes, that will be two of the three. • You should review the grading policy in our course syllabus, especially Sec. 20.4.3. There the student got a 2-notch bonus (course grade would have been B+, but it became an A), due to having an A- project. I gave 1 notch for a B+ project and 3 notches for an A project. Once again, to me the project IS the course. Wednesday, March 10, 9:55 pm A followup to the blog posts of March 1, 7:55 pm and March 7, 10:20 am: In school, most if not all of your assignments are "sanitized." Even if some courses may have students do some data cleaning, in many respects this project goes considerably further, where even the measurements themselves (e.g. driver idle time) may not be so obvious. Some of you may have heard the term proxy. In economics or other statistical analysis, it means that even though we don't have data on our preferred variable U, we do have V, which to some extent serves as a substitute for V, i.e. a proxy. (The root is of course the same as in "approximate," and the word is also used in voting by stockholders.) You may find it best to use one or more proxies in your project. So, I don't have certain output numbers that your analysis must match. There is no "answer key." Instead, I want your group to discuss various alternatives for each problem that arises, and choose one. Then in your report, present the alternatives and explain your choice. Tuesday, March 9, 1:40 pm I had been missing a couple of the solutions files for Quiz 6, fixed now. Tuesday, March 9, 11:25 am The regtools package also includes a dataset peFactors. Make sure you are familiar with it for this week's quiz. Run data(peFactors to load it. Learn about it by typing > ?peFactors  BEFORE you take the quiz, as ? may not work for you during the quiz. Monday, March 8, 2:25 pm Continuing with the "Candidate X" example from our 3/8 lecture, say you survey 500 people, and 288 say they will vote for X. Let's test the hypothesis H0: p = 0.50000000..., where p is the population proportion of people who will vote for X. We will use α = 0.05. Here T is pest, the proportion of people in our sample who will vote for X (288/500 = 0.576 with the above data), and θ is p, with θ0 = 0.5. What about s.e.(T)? As noted in the March 5 blog post, pest, coming from indicator variables is a special case of "X bar," so s.e.(T) = s/n0.5, with s boiling down to [(pest (1 - pest) / n]0./5, about 0.0221. So, Z = (0.576 - 0.5) / 0.0221 = 3.4389 Is |3.4389| > 1.96? Yes! So we reject H0. We had the policy of believing H0 unless and until we get strong evidence to the contrary. If H0 were true, there would only be a 5% chance of |Z| > 1.96. The latter event ocurred, and rather than dismissing it as a rare event, we choose to abandon our belief in H0. And now we get greedy: Since 3.4389 is quite a bit above 1.96, we suspect we would have rejected H0 even if we had chosen a more stringent value of α than 0.05. How about, say, 0.01, i.e. 0.005 area in each of the left and right tails? > qnorm(0.005) [1] -2.575829  So the rejection cutoff with α = 0.01 is about 2.58. Is 3.4389 larger than that? Yes! So we would have rejected H0 even under the more cautious level α = 0.01. Just how far could we take this reasoning? Well, we can work backwards: > 2 * pnorm(-3.4389) [1] 0.0005840829  Wow! Not just 0.05, not just 0.01, but 0.0005840829! Super significant! This is the p-value. Candidate X enjoys the support of a majority of voters in the population, at a super-highly significant level! Well...don't celebrate yet. X is in the lead with 57.6% favoring him in our survey, but that it not really far above 50%; things could change later in the election campaign. And, consider another example: Say we survey 10,000 people, and 5188 of them favor X. It turns out that Z = 3.7626, with a p-value of 0.000168156, even tinier, ie even more super-"significant." Yet here X would have a lead of only 51.88%, pretty close to 50%. The key problem is that the significance test is not telling us HOW MUCH Candidate X is in the lead. It's only telling us whether he is in the lead at all. A 95% CI for p in that example with n = 10000 is (0.5090069,0.5285931). It's telling us 2 things: • We estimate X's support to be about 51%, only slightly above 50%. • Our estimate is pretty accurate, judging from the narrowness of the interval. This is much more informative. READING: Pages 243-247 and 249-254. Monday, March 8, 2:15 pm I've been getting a number of queries re Problem A and posters. I've been assuming most people would choose the pandemic option, i.e. papers rather than posters, as most people are not on campus and the online posters don't offer much choice. But a poster is fine if you want that. No matter what, though, keep in mind the blog post of March 2, which said, In Part A, [any author] with a Professor title is eliigible, including Assistant and Associate Professors. Monday, March 8, 9:00 am The following problem was in one of versions of Quiz 6: Say we have a random sample, Xi, i = 1,2,...,n from some population in which X has unknown density fX. We are interested in the population mean μ, and will use W = the sample mean ("X-bar") as our estimator. Then |W - μ| is our estimation error, and we may be interested in E(|W - μ|). Use simulation to evaluate that in the case of fX being an exponential distribution with mean 2.0. sim <- function(n,nreps) { } print(sim(1000,10000))  Can you work problems like this? And what about this one? Say we have a random sample, X1, X2, ... Xn, from some population in which X has unknown density fX. We will form a histogram, but with freq = TRUE so we get actual bin counts. Say our bins are (i-1,i] for i = 1,2,...,25, with an extra bin (25,∞). Suppose, unknown to us, fX is exponential with mean 2.0. Let NI be the count for bin i, i = 1,2,...,25. Find Var(Ni).  varHist <- function(n,i) { } print(varHist(100,2))  Monday, March 8, 8:55 am Our lecture today will cover problems with hypothesis testing, using this document in regtools. Of course, it is directly related to Problem A of the Term Project. Sunday, March 7, 3:10 pm Your group submits just ONE copy of your progress report. Note, though, that it must list what each person has done so far. Sunday, March 7, 10:20 am Reminder concerning Problem B: This is real data, full of imperfections and ambiguities. There are various issues on which you will need to make decisions, e.g. whether to include certain trips in your analysis. You will need to discuss these issues in your group, and finally decide how to handle the issues. It is imperative that you explain your decisions in your report. And clearly, there is no single "correct" way to handle things. Sunday, March 7, 9:05 am • Reminder: Term Project progress reports due tomorrow. • In this week's quiz, I will include at least one question that is a slight variant of one in Quiz 5 or 6. Saturday, March 6, 10:45 pm Please make sure to have both regtools and mlbench installed on your machines for this coming week's quiz. Again, this means you must be able to run library(regtools) and library(mlbench) from within OMSI. Note too that the version of regtools must be the one from my GitHub site, not CRAN. Friday, March 5, 9:00 pm (As with any blog post, this is considered part of our course. It is related to NON-required material in the book, which you may wish to read, but that material is optional.) Many statistical quantities are approximately normally distributed if the sample size n is large, just as is the case with the sample mean. If T is such a quantity, estimating some population value θ, we refer to the estimated standard deviation of T as its standard error. Here is how that fits in to the example we've had so far, with T being the sample mean and θ being the population mean. We know that Var(T) = σ2 / n, where σ2 is the population variance. The latter is unknown, but is estimated by s2, the sample variance. We say that s / n0.5 is the standard error of T; let's denote it by s.e.(T). And that means we can follow the same pattern we used to derive an approximate 95% confidence interval for the population mean. For any asymptotically normal T, we have that (T - 1.96 s.e.(T), T + 1.96 s.e.(T)) is an approximate 95% confidence interval for θ. For the same reason, we can do testing, say for H0: θ = θ0: Reject H at the 5% level if |Z| > 1.96, where Z = (T - θ0) / s.e.(T); under H0, the probability of rejection will be 0.05. Similarly, we can compute the corresponding p-value, which is double the area to the right of Z if Z > 0 (or to the left of -Z if Z < 0). Example: Esimation in linear regression models involves a lot of sums, suggesting that the Central Limit Theorem can be used. In fact, one can show that the estimates of the βi are approximately normally distributed. Let's use that for the mlb data.  data(mlb) > mlb <- mlb[,4:6] > linout <- qeLin(mlb,'Weight',holdout=NULL) > summary(linout) ... Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -187.6382 17.9447 -10.46 < 2e-16 *** Height 4.9236 0.2344 21.00 < 2e-16 *** Age 0.9115 0.1257 7.25 8.25e-13 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ...  The estimated βheight is 4.92, so one extra inch of height means, on average, about 5 extra pounds of weight, with a standard error of 0.23. Adding and subtracting 1.96 times the standard error gives us an approximate 95% CI of (4.46,5.38). We can test the hypothesis that βheight = 0. (An absurdly false hypothesis.) We compute (4.92 - 0)/0.23 = 21.00. The area to the right of that under the N(0,1) density is tiny, in fact about 10-16, as it says in the last column. By the way, what about the case of a proportion? Since it's an average of 1s and 0s, it's just a special case of means. But it turns out that s2 reduces to something simple: Let T be the sample proportion, i.e. "X bar" in Eqn. (11.24). But since each Xi is either 1 or 0, and since 12 = 1 and 02 = 0, that first term in (11.24) is ALSO T! In other, s2 = T - T2 = T(1-T). Thursday, March 4, 9:55 pm In yesterday's/last night's lecture, I noted that the neural networks picture was missing. It's up there now. Thursday, March 4, 5:40 pm Two items: • For next week's quiz, you'll need regtools on your computers. Make sure to test > library(regtools)  from within OMSI. • The following problem was on one of last week's quizzes: (Text answer - 20 pts) A distribution for random variable X is said to have an increasing failure rate if f_X(t) / [1 - F_X(t)] is increasing for t in the support of X, with similar definitions for decreasing and constant failure rates. (Actually for most distributions the failure rate is decreasing for some values of t and increasing for others.) Consider the U(0,1) distribution and the exponential distribution with mean 1. Which of the following is true? (i) The exponential is increasing and the uniform is increasing. (ii) The exponential is decreasing and the uniform is increasing. (iii) The exponential is decreasing and the uniform is decreasing. (iv) The exponential is constant and the uniform is increasing. (v) The exponential is constant and the uniform is decreasing. (vi) The exponential is increasing and the uniform is constant. (vii) The exponential is decreasing and the uniform is constant. (viii) None of the above is true. If you had taken this quiz, would you have gotten it right? Wednesday, March 3, 11:05 pm Two items: • The book at http://heather.cs.ucdavis.edu/~matloff/132/PLN/probstatbook/ProbStatBookW21.pdf is THE official textbook for our course. The book at http://heather.cs.ucdavis.edu/~matloff/132/PLN/probstatbook/ProbStatBook.pdf is the unabridged version of the book, possibly helpful but ony as an unofficial supplement. Page numbers cited in quizzes refer to the official textbook. • The version of R on CSIF is only 3.4. I have a personal R 4.0 on my account on the CSIF machines, at ~matloff/R400/bin/R which you can use for regtools. Wednesday, March 3, 5:50 pm The dataset for our Term Project is quite large. Many real-world datasets are even larger, but this one is already big enough to be of concern in various ways. Please note that one of the reasons I chose this data for the project IS its large size. In other words, dealing with the size issues is an important part of the educational value of the project. Note the following: • Keep in mind that I said the project is harder than it looks, because various issues will arise to be solved that are not apparent at the outset; this is one of them. Note too that your report is supposed to discuss major issues that arise, what solutions you considered, how you made your decision, etc. This includes not only statistiical issues but also computational ones. • Thus I want you to do as much of your Project as possible using the full data set. That means everything but your two machine learning methods. • Arnav told me that he and some students found that they couldn't even load the train.csv data into their machines, due to insufficient memory. But it runs fine on the CSIF machines; I found for instance that it not only did load successfully but also it took under 3 minutes to read in the entire file. So, you might do everything but the ML methods on CSIF. • You may wish to use the data.table package, by Matt Dowle, whom I and many others consider one of the very top people in R. It can do many data manipulation tasks very quickly, both because it is threaded and also because it exploits R internal structure (e.g. row-major memory storage) very finely. The function for file read is fread(); I found it to be only slightly faster than read.csv() on this data, but it may speed up other tasks that you have. • This still leaves the question of the computational load of your statistical analysis. Some machine learning algorithms are quite computationally intensive. Some suggestions: • You may consider doing your PRELIMINARY analyses on a subset of the data, say 50,000 randomly selected rows. Then, when you start to write up your report, use the full data. • You can use a method I developed, which I call Software Alchemy. The idea is very simple. Say you are predicting widget sales for 2022, based on various features, using ML method Z. Break the data frame into k chunks of rows (again, randomly chosen); run Z on each chunk, producing k predicted values; then take as your final predicted value the average of those k numbers. • This can speed things up for two reasons: (a) You can run the k invocations of Z in parallel, say as k processes on the same machine or on k different machines on CSIF. You could use the partools package to coordinate all this, or simply have the k processes write their answers to disk. (b) Say Z has a time complexity of O(nd) for an n-row dataset. Then each process has complexity O((n/k)d) = O(nd / kd), and since they run in parallel, the latter will be the running time -- a speedup of about kd (if d > 1)! In fact, even running serially gives a speedup of about kd-1. Wednesday, March 3, 12:10 pm A student asked whether the special lectures on distribution fitting and machine learning methods will be covered on our remaining quizzes, i.e. this week and next week. The answer is Yes! Note that the material in these special lectures definitely relates to our earlier course material. The same will be true for the quizzes. Remember, the policy is that any quiz covers material through the most recent lecture, i.e. a Wednesday, though in practice it usually will be only through the most recent Monday. However, today's lecture (postponed until this evening) will help solidify what I did on Monday. This evening's makeup lecture should appear on your Canvas page. Tuesday, March 2, 9:25 pm I've added several pictures to the document I started discussing in lecture yesterday, and elaborated some more in the text. Tuesday, March 2, 8:55 pm Project news: • In Part A, anyone with a Professor title is eliigible, including Assistant and Associate Professors. Note that if you choose the pandemic option, you probably will use a full paper, not a poster, but you need not discuss the entire paper. • In installing regtools, make sure to follow instructions carefully, both in the specs and in information posted in the blog. As always, search paths are key. Monday, March 1, 7:55 pm Remember, this project is largely open-ended. There are no unique answers, and there are a number of places in which you will need to exercise your own judgment. This is real-world, not the crisply-defined, cloistered, sanitized, insulated project that you may be used to. If you encounter an issue that has a unique solution, you should: • Have a spirited discussion about it in your group. • Choose and implement a solution. • And, most importantly, discuss the problem in your report. What was the problem? What solutions did you consider? Which one did you choose? Why did you choose it? How did you implement it? Monday, March 1, 2:45 pm I also mentioned that though random forests are generally thought of as having been invented by Leo Breiman, a famous statistiics professor at UC Berkeley and UCLA, the idea was first proposed by Tin Kam Ho, who called them random decision forests. While it's true that Breiman is the main one who refined RFs, Ho must be recognized for her role. These days proper credit is finally given to prominent women scientists whose work was sadly overshadowed by male colleagues, e.g. Rosalind Franklin in the discovery of DNA. So, good for Dr. Ho! (And even better that she is Hong Kong Cantonese. :-) ) Breiman, by the way, was a very colorful character. He had a nice life as a full professor at UCLA, doing research in probability theory, but suddenly quit to become a free-lance consultant, so that he would have time to take care of his children (foster dad). Through the kids, he saw first-hand how bad (in his opinion) the public schools were, so he ran for the Santa Monica School Board. He won, and later became president of the board. Meanwhile, in spite of having no prior background in applications, his consulting business was doing well, and he was inventing new methology. UCB then lured him back into academia. Monday, March 1, 2:25 pm In lecture today I mentioned that one can name vector elements. E.g. > x <- c(5,12,13,8.88) > x [1] 5.00 12.00 13.00 8.88 > names(x) <- c('Jack','Jill','Zack','Bill') > x Jack Jill Zack Bill 5.00 12.00 13.00 8.88  One can name rows and columns of matrices and data frames too. (Of course, for data frames columns are named anyway.) Consider: > head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1  You see 12 columns, right? No: > dim(mtcars) [1] 32 11  Only 11! That first "column," with the make and model of the car, actually consists of the row names: > head(row.names(mtcars)) [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"  > mtcars[2,] mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4 > mtcars['Mazda RX4 Wag',] mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4  And that can be handy: > mercs <- grep('Merc',rns) > mercs [1] 8 9 10 11 12 13 14 > mtcars[mercs,] # get rows of the Mercury cars mpg cyl disp hp drat wt qsec vs am gear carb Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3 Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3 Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3  This becomes even more powerful when used as indices in loops, vectorized expressions etc. Friday, February 26, 11:45 am Sorry, accidentally shut down Zoom. I don't want to start again, as it may wipe out the recording. Please send me your questions via e-mail. Thursday, February 25, 11:15 pm Tomorrow I will lecture on this special material. Wednesday, February 24, 7:00 pm In order to provide you with necessary information for Problem B in the Term Project as early as possible, my tentative plan is: • This Friday I will cover some material (not in the textbook) on fitting parametric distributions. This is related to the first two bullets, though you already have enough to get started. • On Monday, I will cover the first section of Chapter 14, then present some material (again, not in the book) on some standard tools in machine learning, setting you up for the ML section of the Project. • Starting Wednesday, I will then cover Chapters 12 and 13 (only parts; the rest will not be required), and the rest of 14. I'll be getting the supplemtary material on the Web tomorrow. If you want to read ahead, Chapter 14 would be a good choice. Wednesday, February 24, 5:45 pm In the Term Project specs, I elaborated on how to install regtools. There are many ways to install packages, described on the Web, but the example I gave is easy to do on Unix-family machines. It may need a little tweaking on Windows. Let us know if you have any problems. Tuesday, February 23, 10:15 am I've updated the specs file for the Term Project, adding some clarifying language (and replacing qeSVM by qeLASSO). As I mentioned in class: • The Term Project IS the course. • By working with real data in a rather unstructured context, you will understand what people mean in the future when you hear them say "We conducted a study to..." This knowledge is an important part of being an Educated Person. • The importance of the Term Project will definitely be reflected in your course grade. Monday, February 22, 11:45 pm Our Term Project is now up on our Web page! Note NOW the two important dates: • Progress report, due March 8. • Submission, due March 17. I will probably tweak the wording a bit here and there, but it is READY for you to begin! You already have enough background for the first two bulleted tasks, and very soon will have the rest of what you need. Here is a crucial quote from the project specs: It is imperative that you start early. There are many things to be done that you may not realize at the outset. You'll need to figure out how the software works, which can sometimes be challenging in the case of graphics. You'll encounter various idiosyncrasies in the data, and will need to figure out how to handle them. You'll find yourself wondering, "Well, in actuality, what does that statistical method do?" This is a rather large dataset; some computations may have long run times. Etc. Saturday, February 20, 9:20 pm Since Hwk 2 does not cover more recent course topics, you might like to have something to exercise your understanding. Here's one below. It will only take a few minutes, and it will help solidify your understanding. One can show that (in the language of abstract math we've used) that the Poisson family is "closed under independent summation." This is just a fancy way of saying that if X and Y are independent Poisson random variables, then their sum S = X + Y also has a Poisson distribution. Note that since a Poisson mean is the parameter lambda, the lambda values must then add too: The lambda for S is the sum of the lambdas for X and Y. And of course, all this then extends to sums of more than two such terms, by induction. Say X1 X2,..., Say X100 are independent Poisson, each having lambda = 1. So S, their sum, also has a Poisson distribution, with lambda 100. But since S is a sum, the Central Limit Theorem then tells us that it has an approximately normal distribution. We could then find, say P(S <= 90) in two ways, either using ppois() for the exact value, or pnorm() to get an approximate value. Of course, since it is easy to get the former, the latter is not that useful, but out of curiosity, we might try the latter to see how close the approximation is. TRY IT! Saturday, February 20, 8:50 pm What I had as "(0,X)" in Problem 5 has now been fixed as "(0,Y)". I thought I had fixed that a week ago, apparently not. Sorry for any inconvenience. Saturday, February 20, 1:05 pm You may be wondering how your overall quiz grade will be affected by the facts that (a) we've had 2 double quizzes (the oral quiz is the second) and (b) I will be dropping your lowest 3 quizzes rather than my usual 2. Here is a worked-out example for you. Say a hypothetical student in Sec. A02 had the following grades: Quiz 0: A+ Quiz 1-2: B- Quiz 3 C+ Quiz 4 A- Quiz 5 B+ Quiz 6 C Quiz 7 B+  On the standard 4-point scale, this student would have scores 4.3, 2.7, 2.7, 2.3, 3.7, 3.7, 3.3, 2 and 3.3. Then before dropping 3 quizzes, this student's average quiz grade would be calculated as > mean(c(4.3, 2.7, 2.7, 2.3, 3.7, 3.7, 3.3, 2, 3.3)) [1] 3.111111  which is a B. (3.2 is needed for a B+.) After dropping the 3 lowest quizzes, the calculation would be > mean(c(4.3, 2.7, 3.7, 3.7, 3.3, 3.3)) [1] 3.5  That's a B+. (The cutoff for A- is 3.6.) Saturday, February 20, 10:40 am Yesterday in lecture, in discussing the relation between the exponential distribution family and the Poisson distribution family, I mentioned that that section of the book is somewhat confusing, as the symbol lambda is rather "overloaded," used differently in different contexts. I've now replaced lambda by eta in the full version of the book, Section 9.1. You are not required to read the revision, but you may find it helpful in understanding Section 8.1 of our course's official book, which is required. Saturday, February 20, 9:00 am We are now entering the phase of our course in which we most use linear algebra. In quizzes and the Term Project, you will be expected to know Appendix B of our textbook well, especially Section B.8, Matrix Algebra in R. Thursday, February 18, 7:30 pm Please note, again, that although the time slots for the oral quizzes are nominally 5 minutes, the typical time is 6 or 7 minutes. Wednesday, February 17, 3:50 pm The "grades" for this afternoon's oral quiz, all NA, of course, were accidentally sent out. Wednesday, February 17, 8:30 pm I hope to have your Term Project specs ready by the middle of next week. It will be due on the day of our scheduled final exam. (The Project is in lieu of a final.) As I've said a number of times, the Term Project IS the course. The earlier material does have its own independent value, but to me a major point is that it lays the foundation for the Project. And, as explained, in many cases the project has an outsized impact on a student's course grade. Recall that one of the topics in our ethics meetings concerned collaborating with one's teammates. I believe that has generally gone well, but as I am currently having a major problem with one particiular group, that suggests there may be others. You must fix any such problems NOW, since the Term Project will by its nature demand COLLABORATION. Not only will there be a considerable amount of code to be written, but much more importantly, you will need to: (a) Before starting your data analysis, decide what methods you'll choose to analyze the data with (largely open-ended) and (b) after conducting your analysis, decide how to interpret the results and how to write them up clearly in your report. Remember, in your Project grade, the words will be just as important as the numbers. Wednesday, February 17, 8:30 pm I've been requested to extend the due date for Hwk 2, since I didn't have office hours this week. So, the new date will be Monday, 2/22. Saturday, February 13, 11:15 pm I've added test cases for Problems 3 and 7. Again, do not underestimate Problem 7. It can be done rather compactly -- in my version, only 49 lines, including comments and blank lines -- and I've given you a broad outline, yet I think most groups will find it difficult to understand the concepts. As noted, do not wait to work this problem. Saturday, February 13, 9:50 am I will not be able to meet my office hour on Feb. 17. Friday, February 12, 9:55 pm I've fixed typos in Equations (7.44), (7.56) and (7.57). See the new PDF. Friday, February 12, 7:35 pm So, the first week of oral quizzes comes to a close. I thought I'd give three examples of questions, and comment on how students whom I asked these questions did in their responses. (Don't worry about this "spoiling" things; there are lots of other questions I've been asking, and lots more that I could ask.) • "Our course has calculus as a prereq, including infinite series. Have we used infinite series yet?" I asked this to maybe 8-9 students, and only a couple gave a correct answer, along the lines of "Why, yes! We did a lot with the geometric distribution family, and also had infinite series with the Poisson and power law families." Many said no, we haven't used infinite series yet. Some even said, "Yes, we had some integrals with limits of infinity," totally irrelevant. • "Loops are said to be slow in R. Can you tell me ways to speed up simulation code by avoiding loops?" Some students did correctly identify one such way, to vectorize the code, as in the simulation of the parking place problem. Some also mentioned R's replicate() function, though many of the ones I asked directly about this function either had no answer or did not realize how it was specific to simulation. • "What is a cumulative distribution function?" I also asked variants of this, e.g. "For various distribution families, R has d/p/q/r functions. Are any of them related to cdfs?" Since this is a more recent topic, I thought it would be fresher in students' memories, and indeed, most students did do well on such questions. Some had no idea at all, though, which surprised me. Friday, February 12, 9:55 am Reminder: If you send me a class-related e-mail message, please include 'ECS 132' in the Subject line, to ensure that I see it. Friday, February 12, 8:55 am Misc. items: • No class Monday, holiday. • I've fixed some notation issues in the solutions for the library book problem on Quiz 3, A01. Thursday, February 11, 10:45 pm Due to unforeseen circumstances, each of the two sections had a double quiz. It does indeed count as two quizzes. So for instance, in Sec. A01, Quiz 3 will be recorded as Quiz 2 and Quiz 3, as if they were independent quizzes. If say your (letter) grade there is your lowest quiz grade, then Quiz 2 and Quiz 3 will be dropped, leaving you one more quiz to be dropped. Thursday, February 11, 9:35 pm Section A02 students, please sign up for your oral quiz here. Thursday, February 11, 8:05 pm Here is some more on Problem 7. I stated that there is lots of information on discrete event simulation on the Web, including some R packages, such as simmer. (The best Python one is SimPy, in my opinion.) I said the packages are too complex for you to learn merely for working this problem, and really, the information I gave in last night's blog should be enough. However, you might gain by looking at the README file for my own package, DES, for guidance. If you wish, you can use some DES package (must be in R), including mine, to do Problem 7. I will instruct the TAs to give you Extra Credit. Note: Assume that initially, all the spaces are empty. Thursday, February 11, 11:35 am I must reluctantly ask everyone NOT to wear earphones during the oral quiz and homework grading, as they could be used for outside communication. Sorry to bring this up. I know the vast majority of you do not cheat. But I have already encountered two cases of suspicious behavior during the oral quiz. Wednesday, February 10, 11:35 pm As I noted when I first assigned Hwk 2, it is not easy. The hardest problem is probably Problem 7. It actually is an example of discrete event simulation. There's lots on the Web on this, and some R packages, but it's really not necessary to use them. You may find that reading some Web resources clarifies your thinking, but the R packages are complex, best avoided. Here are some hints: • The main part of your code will be a while loop. Each iteration will simulate one event. • You will need some sort of queue to hold all pending events. You can just make it a matrix or data frame, one column for the event type (car arrival or car departure from a parking space), another for the event time, one row per event. • Your code should maintain a variable, say simTime, which holds the current (simulated) time. • At the beginning of an iteration, your code finds the event in the queue having the earliest scheduled time, removes it from the queue, and acts on it: • It advances simTime to the time of the deleted event. • If the deleted event is a car arrival, the code will determine the next car arrival, i.e. generate an arrival time and add it to simTime, and add it to the queue. It will also mark one of the empty spaces as full if there are any, and schedule a "departure from parking space" event. If there are no empty spaces, no new events are generated. • If the deleted event is a "departure from parking space" event, the code will mark the space as being available. • If, upon deleting an event from the queue, and the new simTime is greater than timeLim, exit the loop. Of course, you'll need "bookkeeping" code, updating the sums necessary for the ouput. Wednesday, February 10, 11:10 pm Here is one of the equations you should have for Problem 3: m6 = 1 + (1/6) m7 + (5/6) 0 The underlying ideas are: Say you start at square 6. What happens after your first turn? Well, you've already taken 1 turn, hence the "1 +." With 1/6 probability, your roll was a 1, placing you at square 7. Since the die you are rolling "has no memory," the mean time needed to get to/past 0 from there is m7, so that's the expected remaining time you'll have on your trip around the board. If instead you roll a non-1, you're already at/past 0, so your remaining time (expected and actual) is 0. Wednesday, February 10, 8:20 pm Some points on the oral quiz: • I've conducted the first three sessions so far, and have e-mail grades on all of them. If you were in one of those sessions and have not received your grade, please let me know. • I try to take students in the order of the signup sheet, nominally 5 minutes each. It may sometimes be 6 or 7 minutes, so you cannot tell your exact time from your position on the sheet. Thanks in advance for your patience. • I don't recommend last-minute cramming for the oral quiz. What I am assessing is your insight into the course, developed over weeks of careful thought about the material. Although you may use your textbook (and for some questions, I direct you to look at a certain page), most questions are of the type that you should be able to answer without checking the book. Fortunately, most students so far have done well. • I should have the signup sheet for Section A02 ready by tomorrow. Wednesday, February 10, 4:20 pm Somehow the Zoom meetings for my office hours have disappeared. I will restore them now. I may need a few minutes. Tuesday, February 9, 7:55 pm This may help your thinking in Problem 3. • We are implicitly using the "time starts over" notion. Say start at 0 and roll a 5. How much longer will it take to pass 0 from there? The key point is that the expected value of that time is the same as for a player who starts at square 5. After all, the die the player is rolling doesn't "remember" the previous roles, so it now gets a "fresh start. Note, though, that the player who started at square 0 has used up one turn to get to 5. Thus our recursive equation will have "1 +" in it. • Though not necessary in our informal context, a formal derivation would make use of Section 4.15. Remember, we skipped that section and you are not resposible for it, but I do encourage you to read it, especially the intuition at the bottom of p.93. Tuesday, February 9, 9:45 am I just finished the first session of the oral quiz. I think it went pretty well. I will usually send oral quiz grades on the same day as you take the quiz. (I need to write a script for this first.) Almost all of today's grades were in the A and B range, a trend that will hopefully continue. Remember to have your book with you during the oral quiz. You are welcome to look things up (especially via searching in your PDF viewer). However, keep in mind that this is not a "facts" type of exam; instead, it's probing your depth of insight into the course material. If you've been playing an active role in your homework group, and keeping up with the reading, you should do fine. Monday, February 8, 11:05 pm Overall, I would say that recent quizzes have been too difficult. If you look at past quizzes, you'll see for instance that in simulation problems I typically give students most of the code, and have them fill in some blank lines. We (the TAs and I jointly compose the quizzes) have not been doing that. We'll make sure to do that in future quizzes (we have 3 more, in addition to the oral quiz). Accordingly, I will change the rule of dropping the lowest 2 quizzes to the lowest 3. I've done this occasionally when teaching the course in the past. Again, I feel very strongly about the importance of this course. I want you to do well, and to feel that you have learned something to carry with you after you finish school. So, I don't want you to become demoralized if you feel you are doing poorly. As I've explained before, my grading system is very flexible, and most people end up with a higher course grade than their quiz grades. Monday, February 8, 7:30 pm News items: • In the homework, I've clarified some of the phrasing, and completed Problems 5 and 7. Note that I've scaled down Problem 5 a little. • Please note that office hours are good places for you to ask Arnav and me about points in the reading that you are unclear about. As you do the reading, you will probably generate questions; ask them! Getting the answer for one page will open big doors to another. Questions on the homework are very welcome too, of course. Sunday, February 7, 5:00 pm Note: During your oral quiz, you must have your camera on. Sunday, February 7, 3:15 pm If on Quiz 3 (A01) your answer to a library model problem was 2, you probably were misgraded. Contact me to fix it. Saturday, February 6, 4:50 pm By the way, the solution files were in disarray, fixed now. In grading quizzes, a few points have arisen that I'd like to mention: • For reasons I've explained earlier, most students in my courses end up with a course grade that is higher than their quiz grades. But of course you want to do well in the latter. • As you've seen by now, quiz problems are often variants of examples in the textbook. So: • Set a high bar for yourself in reading comprehension. • Follow up on any point in the reading that you are unsure on, asking the TAs and me, either in office hours or e-mail. • Read nonpassively. For instance, in reading some simulation code, ask yourself, "What if the goal were a little different? How would the code change?" • It's crucial that you keep up to date in the reading material, resolving the questions you have right away. The material builds on itself every week, quite rapidly. • So, when lecture n comes, if you have not yet done the reading up through lecture n-1, the material in lecture n may not be very helpful to you. • For the same reason, it's crucial that you begin the homework as soon as it is assigned, and actively participate in your team. If for example, work within a team is delegated rather than shaed , everyone will learn less. • As mentioned, in order to make sure everyone gets at least some points on a quiz, I aim to have at least two or three questions that are extremely easy, something anyone who is keeping up with the material should get full points on. As also mentioned, this includes remarks in lecture that are not in the book. A great example is the first question ("...if statistics were invented today...") in this quiz. If you had been given this question, would you have answered it correctly? If you were in Sec. A01 with this version of the quiz, did you answer it correctly? Friday, February 5, 6:20 pm This message is about Section A02, but please read it even if you are in A01. • The solutions of Quiz 1-2 are now in the OldExams/ directory on our class Web site. • If you believe you deserve more points on a problem, please send me e-mail, stating the quiz number, your section number and your version number, together with the reason why your work shows more understanding than you got credit. Note that credit is given in increments of 5 points. • Reading all the solutions would help you review for the oral quiz. Wednesday, February 3, 9:10 pm Extremely important details on the oral quiz: • It is an absolutely firm requirement. For instance, you cannot skip it, thinking it will be one of the two lowest quizzes that are dropped in your course grade. • It will count as a regular quiz. However, if you do especially well, I may use this to bump up your course grade at the end of the quarter. • If you do poorly, you'll have option of taking the quiz again, with your grade being the maximum of the two grades. • I will probably not ask you to write code or work out math, and will instead stay mainly on "big picture" issues, as seen in my post earlier this evening (6:00 pm). In some cases, I may ask you to turn to a certain page in the textbook, and will then ask you questions on it, say of the "what if" variety. • I truly hope that you find the quiz to be an educational experience, giving yourself a look at what you know and don't know. I believe that in some cases it may even affect how you study and learn in your other courses. • As implied above, the quiz is open-book. But as with the written quizzes, no communication with anyone at all other than me is allowed during the quiz. You take the quiz individually, not in your group. • Have your book ready, as I may ask you to look at a certain page. • Exam duration is short, 5 minutes max. You and I will just talk casually, in a very interactive manner. Though it may sound nervewracking, the setting will be relaxed. I may give you hints if you are struggling to answer something, or just switch to another topic. I may even ask you to name your own topic. • The quiz will be held the week of 2/8 for Secion A01, and in the week of 2/15 for Section A02. You must stick to your own section. • Arnav has set up an online signup sheet for Section A01; A02 will be set up later. Choose a time slot soon, and let me know if you have technical problems with the site. When you fill in the form, use your official UCD e-mail address, the one to which I've been sending you mail; this is absolutely key to your getting credit for this crucial quiz. A Zoom URL will be sent to you later. At the event, you'll be put in a waiting room. • During the week in which you are NOT taking the oral quiz, there will be a minilecture in your discussion section. Of course, this is considered official course material. • Coverage for your quiz will be the same as for regular quizzes, i.e. material up to and including the most recent lecture before your quiz. However, in most cases questions will come from earlier in the course. Wednesday, February 3, 6:00 pm In my office hour today, I asked those present if they would like me to put them through a dry run of the oral quiz, asking questions like those I'll ask in the real thing. They said yes, so here is a summary of what transpired: I said, "Tell me about indicator variables." One student answered that they indicate whether a certain event occurs, good, and brought up some other good points. He did make a wrong statement, which I corrected, and he immediately saw it and added some more comments. I then asked this: "One can see just by the fact that people gave such variables a special name that they must be important. Can you give an example of where we used them?" That latter question turned out to be more challenging. One of the other students present did mention that indicator variables come up in Bernoulli trials, good, but that's just description. What I was asking for was where did indicator variables actually turn out to help us solve problems. For instance, the use of indicator random variables made it really easy to derive the mean and variance of binomial random variables. See also the library examples, etc. Needless to say, those students did well in light of the fact that they hadn't prepared, hadn't reviewed the material, etc. But the point is that what I'll be measuring in the real oral quizzes will be the ability of students to see the "big picture," exemplified above. Wednesday, February 3, 2:35 pm See this interesting tweet and discussion. Tuesday, February 2, 1:15 pm As we are now around the midpoint of the quarter, a reminder re course grading procedures would be helpful. • Grading policies are explained in detail in pp.17-19 of our class syllabus. • I give an example of an instance last year in which a student had a quiz record of F A+ F C+ B+ D B  yet his course was A! Not even A-, but actually A. This is due to various "special offers," e.g. throwing out the lowest two quizzes, giving a major bonus for submitting a good term project, etc. • On the other hand, note carefully the section titled "Factors Reducing Your Grade below the Formula." • Note too the material on p.15 regarding grading of the homework, especially the fact that (a) each student within a group may get a different grade on an assignment, even though the assignment is submitted as a group, and (b) in addition to asking questions about the assignment, the TA will also "ask questions about the general course material." Tuesday, February 2, 10:25 am Yesterday several students asked about Sec. 3.9.4, p.64. I'll go through the details here. As is often the case, the mailing tubes play a key role. The derivation simply consists of repeated application of (3.41) and (3.25). Note that simple algebra implies that an alternative form of (3.41) is E(U2) = Var(U) + (EU)2. A natural first step to try would be to apply (3.41), treating IS as "U": Var(IS) = E[(IS)2] - [E(IS)]2 Now consider the first term on the right-hand side. By (3.25), E[(IS)2] = E[I2 S2] = E(I2] E[S2] Let's look at that latter factor first. By the above-mentioned alternative form of (3.41), we have E[S2] = Var(S) + (ES)2 = 5 + 102 One can evaluate E(I2] in the same manner. Or, one can note that I2 = I and then use (3.78). Make sure you understand how both approaches work. One then evaluates E(IS) as EI ES, again by (3.25). In other words, there is nothing deep at all in this example. The problem simply requires a dogged persistence, applying mailing tubes until one succeeds in coming up with a number. Remember, persistence matters! Monday, February 1, 7:30 pm Reminder: When you send me e-mail, please put "[ECS 132]" in the Subject line. Monday, February 1, 1:00 pm I fixed a couple of typos in Problem 3 of the Homework 2. Thursday, January 28, 11:00 pm Eqn. (3.76) is missing a factor ai aj in the sum of covariances. Thursday, January 28, 9:00 pm Hwk 2 is now up on our Web page. It's somewhat longer, and in my opinion somewhat more difficult, than Hwk 1. So START EARLY! Teams who wait until, say a week before the due date, WILL NOT FINISH. Thursday, January 28, 7:10 pm Shubham held a special quiz this evening for a student whose Quiz 0 was somehow lost. This was not meant to be general. Wednesday, January 27, 4:20 pm • Some students in Davis are still without power. I'm cancelling tomorrow's quiz for Sec. A01; they will do a double quiz next week. • This Friday, Sec. A02 will have Quizzes 1 and 2. It will be combined into one 50-minute quiz. Tuesday, January 26, 12:10 pm Hwk 1 due date now 1/29. Tuesday, January 26, 9:35 am I've gotten a number of questions on how to do Problem 2 in the homework. Here is what I wrote to one student: Because of the recursive nature, you don't write any explicit probability computation into your code. The recursion generates the probabilities on its own, using (2.2) and (2.7). You don't do probability computation yourself, since the problem statement GIVES you the recursive equation. In other words, you're making the problem way too hard. Almost all of the function is already written for you. All you have to do is identify the termination conditions. And even one of those is given to you, p1,1 = 1. Rethink the problem, and if you have more questions, feel free to ask. Monday, January 27, 10:25 am Here is an important point about Quiz Problem 4, V.1, Sec A01, which asks for the probability of the bus having 1 passenger as it leaves Stop 6, given that it has 1 passenger as it leaves Stop 5. In our bus example, we are assuming the bus is initially empty. But we could assume it initially has 1 passenger. Then the above probability is the same as the probability of the bus having 1 passenger as it leaves Stop 1. The reason for this is that even if the bus is initially empty, what happens at Stop 6 depends only on the state of the bus as it leaves Stop 5. E.g., knowing that say L3 = 4 becomes irrelevant. Monday, January 25, 9:05 pm Solutions for Sec. A01 Quiz are now here. Monday, January 25, 7:30 pm The expression at the end of Problem 4 is correct but needs some explaining. I've added an explanation. Monday, January 25, 9:50 am In The Wizard of Oz, Dorothy says to her dog Toto, "We're not in Kansas anymore!" Well, for those of you who might have some prior background in probability, "We're not in AP Stat anymore!" You can't rely on the intuition you used before in "Kansas." Intuition is definitely important, but to solve any nontrivial probability problem, you need more: • Careful distinction between P(A and B) and P(A | B). Think in "notebook" terms. • Writing things out fully, step-by-step, as for instance in (2.27) through (2.32). • Careful use of the "mailing tubes." Sunday, January 24, 11:10 pm Someone asked how to split a .tex file into multiple files. Say Jack and Jill write Jack.tex and Jill.tex, with the aim of combining them into one PDF. To do this, write a file main.tex, with this form: % your usual preamble, e.g. \documentclass{article}[11pt] \setlength{\oddsidemargin}{0.0in} \setlength{\evensidemargin}{0.0in} % etc \input Jack \input Jill \end{document}  Sunday, January 24, 11:00 pm News items: • The course continues to build on itself. So, quiz coverage is cumulative. Questions can be on anything covered in the course so far. • Please indent your code on quizzes, e.g. in loops, if-else. • Remember, Section A02 has 2 quizzes this Friday. • Solutions will be placed in the OldExams/ directory on our class Web site. Sunday, January 24, 8:30 pm Please keep in mind the blog post of Thursday, January 14, 7:45 pm. DURING GRADING, THE AUTO SCRIPT WILL RUN YOUR CODE. If your R code blows up (execution/runtime error), you will get at most 1/2 credit for the problem. Be SURE to run your code, using the Submit and R button in OMSI; that is a major advantage of OMSI, the ability to actually run your code. Saturday, January 23, 12:30 pm ECS 132 news items: • Please remember to include 'ECS 132' in the Subject line when you send us e-mail. • Recall that I will be giving each of your oral quizzes, individually. I had originally planned this for Weeks 5 and 6, but I am changing it to weeks 6 and 7. Friday, January 22, 6:30 pm Somehow the quiz for Sec. 2 is not up. We will do two quizzes for Sec. 2 next week. Friday, January 22, 2:25 pm As noted before, even if a homework problem asks you to find a probability, expected value etc. analytically (i.e. math), you should still write simulation code for your own benefit, to check your math. In Problem 4, I have the opposite advice: Use an analytical solution to check your simulation code. I've added a couple of examples to the problem writeup. Friday, January 22, 2:20 pm Homework submission information on CSIF, per Arnav. Use the handin app: handin acharyya hwk1 Friday, January 22, 9:40 am In order to give you exposure to more examples involving expected value and variance (Problem 5), I'm moving the due date of Hwk 1 back to 1/27. Wednesday, January 20, 11:20 pm Keep in mind, the quiz policy is just like that of the homework. If a problem asks for a probability, expected value etc., it means the exact answer is required; do not do simulation unless it is specifically requested. Please MAKE SURE you are mindful of the blog post of Thursday, January 14, 7:45 pm, especially the point about NOT simplifying math answers. Wednesday, January 20, 7:10 pm In a discussion of Problem 1a in the OH today, a student said his solution steps led him to finding P(X1 + X2 ≥ 4 | X1 ≤ 3 and Y1 ≤ 3) where Xi and Yi are Jill's and Jack's ith rolls, respectively. (Note that for instance X2 is undefined if Jill wins on the first turn.) He was unsure where to go next. Well, first, we have that P(X1 + X2 ≥ 4 | X1 ≤ 3 and Y1 ≤ 3) = P(X1 + X2 ≥ 4 | X1 ≤ 3) Now, what to do with that? Rewrite it as P(X1 + X2 ≥ 4 | X1 = 1 or X1 = 2 or X1 = 3) Then use P(B | A) = P(A and B) / P(A) etc. The message is, Don't give up! Persistence is key! Your solution may turn out to be rather lengthy in some cases. Wednesday, January 20, 7:10 pm I'm still seeing a lot of students try to intuit their way into solutions to the probability problems. Yes, intuition is highly important, but it will often lead you astray if you don't couple it with detailed, step-by-step use of the mailing tubes. Wednesday, January 20, 7:00 pm Someone asked whether numerical values will be announced for the math problems in the homework, so you can check your work. As I explained a couple of times in the blog and in class, what I prefer is that you write simulation code for the purpose of such checks (even if the problem doesn't ask for simulation code). It may be that you find that your math and simulation answers don't match. Carefully go through each, and if you can't find the reason for the discepancy, contact a TA or me. Note that simulation gives only approximate answers, so the math and simulation answers will not match perfectly. Try a much larger value of nreps if you are concerned that the difference may be due to a logical error. The discepancy should diminish. Tuesday, January 19, 9:40 am Homework news: • I clarified the submission instructions at the top of the Hwk 1 file. Please note carefully. • Problem 5: Note that I added that d = 3. • Problem 3a: A student asks whether "we should take into account the people getting off the bus as well." The problem asks for three probabilities; for concreteness, let's consider P(2 fail to board). Once again, the road to clarity comes from the notebook idea. There would be a column in the notebook labeled "2 passengers fail to board." This column would be filled with Yes's and No's. Neither the column's label nor its contents say anything about whether any passengers alight. Of course, in computing the requested probabilities, you will probably need to break events down by the number of alighting passengers, but the P(2 fail to board) probability itself does not mention such passengers. • Remember, you can always check your mathematical solution by writing simulation code. I highly recommend this. Monday, January 18, 9:00 pm News items: • By this point, you should have already been interacting with your group members. If you are not yet doing so, you need to remedy this right away. As explained in the course syllabus and the ethics group meetings, it is imperative that you work closely with your group. Please, MAKE THIS WORK. I have found repeatedly that students who are in relatively inactive groups tend to do poorly in the class. If a group finds that one of its members is devoting little or no time to the group's work, the group needs to let me know right away. • Remember, precise thinking is key to success in the course. So for example, make sure to NOT write something like P(Jill wins on 1st try) or P(Jill doesn't win on 1st try and Jack wins on 1st try) This is nonsense. You are OR-ing together two numbers, when OR should only combine booleans. What you should write is P(Jill wins on 1st try or Jill doesn't win on 1st try and Jack wins on 1st try) = P(Jill wins on 1st try) + P(Jill doesn't win on 1st try and Jack wins on 1st try) Now you are +ing together two numbers, the appropriate role for . This point is made in the book (first bullet, p.17). • Remember, only some of the pages of the book are covered in class. (These are the ones I believe involve the most difficult concepts, thus most important for devoting class time.) But you are responsible for the entire book, except sections that I specifically say to skip. This coming week we will have Quiz 1, which will cover all of Chapter 2. • Make sure you have read the blog post of January 15, 8:40 pm, regarding lost records of your work. Friday, January 15, 8:40 pm Reminder: As explained in our course syllabus, all my files recording your grades on quizzes, homework and the term project are indexed by your official UCD e-mail address (which is the address I've been using in my e-mail messages to you). If you use some other e-mail address, in effect we have no record of your work! This is not good. :-( Keep in mind that this is an issue not just for quizzes, but also for homework and term projects. Note that in the latter two cases, you work in groups. For instance, your group makes just one submission for the group, not one for each group member. That means that the person who submits the homework or term project must make sure that the e-mail address for each group member is correct. Friday, January 15, 3:55 pm There has been a change in office hours. Arnav will take over for Shubham. See our "course at a glance page for details. Shubham is still one of the TAs, so you can still e-mail him etc. Friday, January 15, 2:05 pm Make sure you understand R's *apply() functions. None is absolutely necessary, but they can simplify your work quite a lot. Reminder: All blog posts are considered part of your official course materials. Be ready to use the *apply() functions on quizzes. Thursday, January 14, 7:45 pm Some tips to remember for Quiz 1 next week and all subsequent quizzes: • Each quiz problem will be labeled either as "R code" or "text." • If a problem is labeled "R code," it really means that. Your submission will be run though the R interpreter. If you have any non-code in it, the interpreter will generate an error message. • Comment lines, beginning with '#', do count as R code. If you need to say something to the grader, place it in a comment. • In problems labeled "R code," DO NOT SIMPLIFY YOUR ANSWERS. E.g. do not simplify (1/2) * (4/5) + 1/3  to 11/15. By submitting your un-simpllfied answer, you will be showing your work. Remembe, yur answer will be evaluated by R, so there is no need for simpllfication. And leaving things un-simpllfied is important in case you are partly wrong but deserve partial credit. And even if fully correct, SIMPLIFIED ANSWERS MAY NOT GET CREDIT. Thursday, January 14, 12:30 pm Someone asked whether it's OK to use '=' instead of '<-' as your assigment operator. The answer is "usually." There are situations, e.g. this one, where it may give you the wrong result or even generate an error message. On the principle of "better safe than sorry," I do urge you to use '<-' for assigment. And I urge you to put a shortcut into your text editor's startup file. For instance, I have this line in my vim startup file: map! eqq <-  Then every time I want the "left arrow" assignment operator, I simply type 'eqq'. No hunting on the keyboard for '<','i, '-' or even '='. Wednesday, January 13, 2:50 pm Shubam will hold a special OH today re OMSI, 4 pm. Tuesday, January 12, 11:00 pm Misc.: Tuesday, January 12, 9:55 pm Concerning office hours (TAs and myself): • OHs are for the purpose of clarifying course concepts, helping with code issues and helping with homework problems. • If you need hints for a homework problem, we will provide them, so that you can think about them at home and continue work on the problem. (Note: "You" here is plural, meaning you and your homework group; you are all working together.) • Needless to say, we will not provide you with a complete solution, but hopefully enough hints to get you started. If after further work at home you (plural) later need more hints, we will provide them. You can attend another OH or send e-mail. • When more than one person is present in an OH, we will have you take turns asking whatever questions you have. When others are asking their own questions, you are very welcome to listen. • Please do not try to complete your homework right there in the OH, hastily writing something up and then asking us, "Is this right?" Again, careful contemplation at home is the goal. • Note that simulation can usually serve as a check on your analytical (i.e. math) solution, even if the problem does not ask for simulation. • Every homework problem, every quiz problem, is different. This is not a matter of learning set patterns, which you may have done in calculus. Intuition is key, and that only comes from long, careful thought. • Again, remember you are working in a GROUP. When you are in an OH, you are representing your group, and will bring what you learn in th OH back to the group. Please the word "we" rather than "I." Tuesday, January 12, 4:50 pm CRUCIAL instructions for taking quizzes. Follow these TO THE LETTER! 1. Make sure you have thoroughly tested OMSI, playing the roles of both student and instructor. 2. In your above tests, do at least one in which the server is running on CSIF, which will be the case during quizzes. Note that that means you must be running the UCD VPN. 3. The server host and port for your quiz will be shared with you in separate e-mail message, the day before the quiz. DO NOT SHARE THESE WITH OTHER STUDENTS; doing so will result in a report to the SJA. 4. That e-mail message will also state the start and end times of the quiz, usually a duration of 25 minutes. Of course, once the server shuts down, you cannot submit any further work. In fact, even if you submit just before the server goes down, it may not be processed by the server on time. 5. The quiz will be hosted during the discussion sections. You must take the quizzes in your enrolled discussion section. 6. As your discussion section is about to start, enter the Zoom session for that section, via Canvas. The TA will start the server at the appointed time, and you may begin the quiz. 7. Questions from students to the TA via Chat are allowable for clarification of the quiz questions' wording. Questions about the course content, the location of items in the textbook and other course materials will be answered with a polite statement, "Sorry, I can't answer that kind of question." 8. If you have a question during the quiz, address it ONLY to the TA, not to Everyone (the default). We may be able to disable the latter. 9. Quizzes are open book/notes. For convenience, you may wish to use OMSI's PDF feature, as it allows searches. 10. You are NOT allowed to otherwise access the Internet during quizzes. 11. You are NOT allowed to communicate with anyone, in the class or not, during a quiz. 12. Be ready for the quiz ON TIME. Clearly, with the quizzes being only 25 minutes in duration, you cannot afford to be late. Tuesday, January 12, 8:40 am ECS 132 news items: • If your student ID number is on the list in the blog post of Sunday, January 10, 11:30 am, you should have received e-mail from me last night with the Zoom link for tomorrow's ethics meeting. You are required to attend. It will be held at the end of class but on a DIFFERENT Zoom link. • I made a correction to a link in the post of 1/11, 19:15. • Hwk 1 due date is January 25. • I will be posting the next batch of problems for Hwk 1 today. Monday, January 11, 19:15 pm This post will be about LaTeX. LaTeX is a standard typesetting language in tech and the sciences. It was invented by a computer scientist, the Turing Award winner Leslie Lamport. It consists of a number of macros for the more basic typesetting language, TeX, invented by another Turing Award winner, Stanford CS professor Donald Knuth . It is used by most CS professors in their research. If you do research with a professor, you probably will use it. It is also the basis for math typesetting in the Wikipedia. What tools might you use to write LaTeX? • I myself use my regular text editor, Vim, with my own homegrown macros. For instance, say I want to typeset e-x1. In LaTeX, that is e^{-x_{1}}  But I merely type emsp-xmsb1 Here 'msp' is the name of my macro ("Math SuPerscript"), and it is automatically expanded as I type. I have 'msb' for subscripts etc. This is very, VERY fast to type once you get used to it. I use Evince as my PDF viewer on my Linux desktop machines (home, office), and Skim on my Mac laptop. Both have the advantage of automatically updating the PDF file when one compiles the .tex file (which by the way I have triggered by a Makefile). Not quite as good as the continuous updating of Lyx, but on the whole the best solution for me. • A popular tool for writing LaTeX is Overleaf . Its chief advantage is for collaborative work. For us CS people, who are users of GitHub, Overleaf may not be advantageous. • Another popular LaTeX tool is LyX. It is fully point-and-click, which some of you may find appealing, and the PDF window is updated as you type. How can you learn how to do a certain LaTeX trick, e.g. having a column of equations with the = signs lined up? • If you've seen it in my book, just go to the site containing the PDF of the book. All the .tex files that are sources for the book are right there. Just the Google 'site:' option to search for a particilar string, say "bus ridership," and then see the raw LaTeX that produced the effect you saw in the book. • And of course, Google is a great place to look up LaTeX information. Monday, January 11, 11:55 am I corrected the time for the last ethics meeting (earlier blog post today). It is 11:45. Monday, January 11, 10:10 am I've mentioned that as CS students, you should be good at using Unix (i.e. Unix-family OSs, notably Mac or Linux). This is NOT required for our course, but it's something that you should do simply as a tech expert. Note again that Intel once complained that UCD CS grads don't know Unix well. The only way to get good at Unix is to use it in your daily life. Just because you, say, did well on Unix exam questions in ECS 36B doesn't mean you know Unix, not at all. The only way to know Unix on a practical level is to USE it, learning along the way as issues arise to be solved. If you have a Mac, fine (but use the command line a lot). Otherwise, run Linux. And that means using REAL Linux. Any emulator, virtual machine etc. will fall short in one way or another. That includes the Windows Subsystem for Linux. A student tried to run OMSI on WSL, and got an error message, "no$DISPLAY environment variable." That was apparently due to this problem with WSL, a great example of why WSL is not the "real" Linux, even though Linux has been installed.

But the incident shows more than just the failings of WSL. The important thing is that, as a CS student, you should know what this error message means. In fact, it was discussed briefly in my blog post of January 4, 7:15 pm, so you can see it is a common error, thus something any CS student should know about. This is an example of what I meant above in my "learning along the way" remark; this kind of daily life stuff will NOT be on an ECS 36B exam.

Using Unix on a daily basis will improve your productivity. Same for a using a debugging tool. If you learned, say, gdb in ECS 36B but are still using print statements to debug your code, you are doing yourself a disfavor. The purpose of including a debugging tool in the ECS 36B topic list is to prepare you to do effective debugging in every course from then on, NOT just to have one more topic for your final exam.

If you have a Windows machine, I recommend that you set it up to dual-boot Windows and Linux.

Monday, January 11, 10:00 am

The last ethics meeting will occur at the end of class this Wednesday, at 11:45 am. If you are in the list in the blog post below (1/10, 11:30 am), you must participate at that time. A Zoom invitation will be e-mailed to you later today.

Sunday, January 10, 11:30 am

Here is a list of students, by last 4 digits of student ID, who we believe have not yet participated in the ethics meeting:

7264
6580
7488
8351
3884
8815
1526
4619
7484
7793
9171
8729
1502
0394
5729
7446


If you did participate, please let me know as soon as possible.

Friday, January 8, 2:20 pm

I fixed a typo in the homework: d> should be just d.

On p.12 in the book, there is reference to node A and node B. This is confusing, as readers may think this is related to events A and B in, e.g., Equation (2.6). So, instead, say that we have two terminals, at which John and Mary are typing. Call the nodes Node John and Node Mary.

So, between (2.15) and (2.16), "A" means Node John and "B" means Node Mary.

Thursday, January 7, 10:55 pm

The first two problems for Hwk 1 are now on our class Web site. There may be some small changes made in the next day or so, but basically the problems are ready and you should get started now.

Thursday, January 7, 9:30 pm

Important news items:

• Arnav will be forming homework groups early next week. If you would prefer to work with someone, please let him know, aacharyya@ucdavis.edu by TUESDAY, JANUARY 12. Otherwise, he will assign you to a group.
• I hope to have the first two homework problems on the Web sometime today. As always, it will be announced here on the blog. Remember, I will be posting the homework problems a few at a time. We will have two assignments in the quarter, with the first due at the end of Week 3, I believe.
• The ethics meeting for Section A01 went well this morning, I think. I look forward to meeting the Section A02 students tomorrow. Please have your microphone and camera ON, so we can interact. (You will need to have them on in other contexts in the course as well, notably the interactive homework grading and the oral quiz.) Also, please do not log on with a pseudonym.
Wednesday, January 6, 9:30 pm

Several points:

• Again, please make sure to arrive to the Zoom session ON TIME in your ethics meeting. Breakout rooms will be formed for whoever is present at the time. If you miss your assigned meeting, don't worry; you'll have a chance to schedule one.
• At some points while you're waiting in your breakout room, a TA will come by to answer any questions you have on the course, the lectures, R, the textbook, etc.
• After your meeting with me, you are done with the meeting and may leave.
• Keep your camera and microphone ON. Remember, you and I will be interacting.
• You will need your camera and microphone ON during the Oral Quiz (Week 5 for Sec. A01, Week 6 for Sec. A02). BTW, it will count as two quizzes (with the same grade). If your grade is below B-, you may take the quiz again; the grade you get there will replace the old grade if it is higher, otherwise discarded.
Wednesday, January 6, 3:25 pm

Ethics meetings: Those who added the course late, or who miss their assigned meeting, will be handled separately.

Wednesday, January 6, 12:15 pm

Reminder: We have our "ethics meetings" this week, during your assigned discussion sections.

You were e-mailed Zoom invitations. (In class today I mistakenly said it was via Canvas, which is not true. For this particular meeting, it is on my personal Zoom room.)

It is crucial that you JOIN THE ZOOM SESSION ON TIME. I will create breakout sessions and meet with you one group at a time, and the way I have it set up, Zoom will form breakout room groups AMONG THOSE PRESENT AT THE TIME. (Breakout rooms can be pre-assigned, but I didn't do it this way.)

After your group's meeting is done, you are free to leave.

Tuesday, January 5, 9:55 pm

Office hours for the TAs and me have now been posted, as recurrent meetings on Zoom via Canvas.

• NM M,W 4:30-5:30 pm
• AA Tu 10:30 am-12 pm, Tu 6-7:30 pm
• SP M 9-10:00 am, W 1 pm to 2 pm

Tuesday, January 5, 1:55 pm

If you were enrolled in the course as of early afternoon today, you have been e-mailed an invitation to the ethics meeting for your discussion section THIS WEEK. Please read the instructions immediately.

As has been mentioned, this meeting is REQUIRED. You will not be able to take any quizzes (70% of the course grade) if you have not attended. Since this is your regular discussion section, you should not have a time conflict, but if you do, please contact me immediately.

Those on the class Waiting List will be handled separately, if/when you are admitted to the course.

Monday, January 4, 8:45 pm

Sorry for all the blog posts today. We of course will have more now at the start of the quarter, but once things settle down there will usually between 0 and 2 per day.

The purpose of the current post is to announce that the course syllabus is ready

Yes, it is absurdly lengthy, but it is your user manual for the course. It gives you full information on homework, quizzes, the term project, group work and grading.

Note these points in particular:

• Homework grading is interactive. You attend the grading session with your group, but you are graded individually. You might, say, get an A grade while one of your teammates gets a D. The TA will ask you questions like, "What if the problem statement had instead been such and such?", and also ask you questions on the general course material so far.
• Course grades are flexible. There is a formula based on your quiz and homework grades, but that is just a lower bound; your grade can actually be higher than that. See the worked-out example.
• Note in particular the importance of the term project. In my view, the term project is the course. I've mentioned several times that my goal is that at the end of the quarter, you feel you've learned something that will stay with you after graduation and beyond; I hope the term project makes a big contrbution to that goal.

The importance of following directions correctly cannot be overemphasized. A sad example occurred one time concerning the term project, which is due at 11:59 pm the day of the scheduled final exam. (We don't have a final.) At 12:03 a.m. I received a message from a frustrated, very panicky student saying that he didn't know how to submit the project to CSIF. His teammate had been the one to submit the homework to CSIF, but he was already on a plane home. Of course, I explained what to do and didn't impose a penalty, but this student had gone through a lot of avoidable anguish.

Anyway, that's the last of the numerous messages this evening. See you Wednesday!

Monday, January 4, 7:40 pm

More on OMSI:

As you know, our first quiz will be held in Week 2. It will be a "warmup" quiz. EVERYONE should get an A+.

And MOST everyone will. Here is the distribution from my teaching the course last year:

> z <- read.table('Quiz0Grades')
> table(z$V7) A A+ B C D D+ F 1 106 20 1 9 5 1 ` As you can see, more than 2/3 of the class did get an A+, but about 10% got D or F grades. Those in that latter group simply didn't prepare. Some did recover and eventually get A or B grades in the course, but it certainly is a bad way to start the quarter. :-) Remember, OMSI helps YOU get a better grade, because you can test your code and revise it if it doesn't work right. One time (pre-pandemic days), a student forgot his laptop and had to take the quiz on paper. He was very upset with himself, saying "That puts me at a disadvantage." I've been using OMSI for 5 years now, and it works well. It's not fancy, e.g. no code syntax highlighting, but it does its job. You may wish to browse through the source code; it's complex but hopefully well organized. Good example of network and threaded programming. Suggestions for new features etc. are always welcome, but unless they are quick to implement, they may just be filed away. OMSI is simple to use, but you do need to read through the entire documentation. Please be patient. Monday, January 4, 7:15 pm During quizzes, please run OMSI on your own laptop, not CSIF (stated here, which is linked to from the OMSI docs). I have several reasons for this, but now I'd like to use this as a "Unix lesson." A student tried running the OMSI on CSIF, and get an error message, "no$DISPLAY environment variable." Here's why:

When you use ssh to connect to CSIF remotely, you can only run text applications, not GUI. That error message is saying the CSIF machine has nowhere to display the OMSI window.

My point in bringing this up is to show an example of what "knowing Unix" means. I mentioned today in class that Intel recruiters once complained that UCD grads don't know Unix well. As you can see here, it's a lot more than just knowing the ls and cd commands. And the only way to gain this knowledge is to use Unix (Mac or Linux) in your daily life.

BTW, one can run GUIs with ssh if one uses the -Y command line option.

Monday, January 4, 2:50 pm

Re Discord:

• I encourage students to help each other on homework. Teamwork is important, and camaraderie is especially vital in these gloomy days of pandemic. So, if students want to form a Discord page, I think it's great.
• However, I myself don't set up Piazza sites and the like, for a reason: Many students treat these sites as first resorts, not last resorts. They are just using the sites as shortcuts, a way to minimize the time they spend in the course. Here is an example I saw once (not in my class):

Student: Anyone know how to do Problem 1 in the homework?
Student, 5 minutes later: Oh, never mind, I see it. But what about Problem 2?
Student, 5 minutes later: Oh, I see it. But what about Problem 3?

:-)

• Also, I know of at least one case where students used Piazza as a way to say abusive things about an instructor, to her face. She could have turned these students into SJA; she didn't, but I hope we all agree that this is unacceptable.
• Finally, I goes without saying that use of Discord etc. is absolutely forbidden during quizzes. Any communication with others during those times is illegal, and will be prosecuted. Any student found cheating will receive a grade of F in the course (grade in the COURSE, not just on that quiz or homework assignment).
Monday, January 4, 2:00 pm

Problems 1 and 2, especially Problem 1, in most quizzes will be very easy. I do this because I want to make sure every student who has been keeping up with the class gets some points. Thus, don't try to overinterpret a problem that you might feel is "too easy."

Monday, January 4, 2:00 pm

In sending e-mail to me or the TAs, please put '[ECS 132]' in the Subject line.

Monday, January 4, 12:10 pm

Sorry I didn't see the Chat messages. They weren't visible to me once I did screen sharing. I'll see about fixing this. Meanwhile, let's see if I can address them here.

The URL for the textbook was in one of the pre-quarter e-mail messages I sent. Those messages are archived here. If you haven't read those messages yet, read them NOW; they are crucial to your success in the course.

Re CSIF: You will need to use CSIF only for submitting the homework and term project. Make sure everyone in your group knows how to do this, to avoid disaster, e.g. unsubmitted term project and failed course grade.

Note: As mentioned over the break, as CS experts, you all should know Unix (Linux or Mac) well; Windows is for non-techies. Knowing ssh is part of that.

Nicholas asked whether the main computer in the example would hold a queue of messages in a buffer. Not in our very simple example here. And even a buffer can fill, with other messages being discarded.

Monday, January 4, 10:40 am

My office hours will be MW 4:30-5:30 pm (starting in Week 2) and by apppointment. TA office hours will be announced soon.

Sunday, December 27, 11:20 pm

Starting January 4, all course announcements will be made here.