To: All those pre-enrolled, or considering enrolling, in ECS 256 next quarter
From: Norm Matloff, Professor of Computer Science
If you have not done so already, please read my general outline of the course. But there is more:
I received e-mail today from a grad student in Statistics, an excerpt of which follows:
...I'm thinking to enroll in your course this winter. A few of us in the stat department were wondering how much emphasis would be on coding efficiencies and how to write packages etc and how much would be on the statistical concepts...
This is a very important question. Is this a Stat course or is it a CS course? Is it particles or is it waves? :-) I'll answer in bullets:
To give you an idea of what I do, you may glance at my recently published book on parallel computing for data science, and a recent research manuscript on a new approach to recommender systems and related models.
These packages will all be mathematical in nature, meaning that they will implement probabilistic/statistical algorithms -- NOT procedures for "data munging," graphics, Web interfaces and so on. For those of you who know Duncan (if you know him, I need not give his surname), think of me as the "un-Duncan." :-) Duncan does wonderful work in the statistical computing field, but my focus is generally more on the math side.
Note that the package-development assignments will be "researchy," meaning that they will be to a large degree open-ended. I will state the probabilistic/statistical procedure that the package will implement, and list some minimum requirements for the package, but I won't tell you what else to put in (up to you), and won't tell you how to do any of it. This will require considerable thought.
At the same time, yes, I do expect that the R packages you create will be of professional quality from a CS point of view -- with reasonable considerations for speed and memory usage, clear coding and super-clear documentation.
Feel free to contact me if you have any questons.