Blog, ECS 145, Fall 2011

December 5, 2:51 p.m.

Examples of fields that are really CS and thus are NOT eligible for our project: Games; spam filters; input of audio files.

December 4, 4:51 p.m.

It should go without saying that the Python or R library/program that is the subject of your project really should be in Python or R. :-) If it actually is just a Python or R wrapper to C/C++ innards, then it is inappropriate.

I've approved several projects so far, on the implicit assumption that the scripting language is NOT just a wrapper. If the heart of your library/program is not in the scripting language, you'll need to contact me immediately.

December 3, 11:18 p.m.

Oops, that last posting was for ECS 132.

December 3, 11:04 p.m.

Please note that in Problems 1(a), 1(b) and 1(d), there is no requirement that you form confidence intervals. You may perform significance tests if you prefer, or even do both.

December 2, 3:48 p.m.

I've moved the date for pre-review back to Dec. 8.

December 2, 3:34 p.m.

I wanted to mention that I was quite impressed with both classes today on the group quizzes. The groups were working very cohesively, with thoughtful discussions (which on eavesdropped on a bit), and with almost no "lurkers." You made it through two tough classes, and came out being able to discuss things intelligently. Good job!

December 1, 1:14 p.m.

Solutions to yesterday's midterm are now on the Web.

December 1, 12:26 p.m.

For tomorrow's quiz, it would be very helpful to review our ut.R example.

November 30, 12:33 a.m.

Please note that in our midterm, you must avoid writing loops, in order to receive full credit.

November 29, 8:43 p.m.

As you know, we will NOT use laptops in our test tomorrow (though we WILL use laptops on Friday). However, I want to achieve part of the "laptop goal" tomorrow anyway, as follows.

You will NOT use laptops during the test tomorrow. However, what I would like you to do is to keep a copy of the code you write, on a separate sheet of paper. Make SURE your copy is the same as what you turn in. Then, before 8 p.m. tomorrow evening, e-mail me a copy of your code. Have it in just a single file, with file name id.R, where id is your student ID number. Make it an attachment to your e-mail message. This will semiautomate the grading: If the code works, you get full credit (unless the problem imposes conditions); if it doesn't work, then I'll read through it and assign a grade. This is good for me--as I've mentioned, I'm worried about shrinking TA funding in future years--but also good for you, in the sense that if your code is hard to understand but does work, you don't risk getting less than full credit. As an incentive, I'll add 10 points to your Midterm score if you comply with this request. Since the Midterm counts 30% of your grade, this is very significant.

November 29, 8:43 p.m.

In choosing your project topics, keep in mind that your report must bring in some material from the application domain, something not part of the general knowledge of CS majors.

For example, one student asked in class whether NumPy would be a suitable topic for the report. I said yes, as it is an application of one of our languages to math. But if you choose this topic, you must discuss some aspect of NumPy that involves math that most CS students wouldn't know about, so as the convolve() function.

November 29, 2:13 p.m.

Quiz 6 had been missing on our Web page, with Quiz 7 being displayed as Quiz 6. Fixed now.

November 28, 12:56 p.m.

See this README file (not README.html) for the ggplot2 intro I discused today, and the location of the data set.

November 25, 9:56 p.m.

I've finished grading the quiz that I administered a couple weeks ago, and will return them on Monday. If you'd like to know your grade before then, just let me know.

November 25, 9:02 p.m.

All this quarter's quizzes and solutions are now on our Web site.

November 24, 1:19 p.m.

I received several negative responses regarding the laptop idea for Wednesday's exam. Most indicated that a significant number of students don't have a laptop. One message said that having a laptop would make the exam more stressful (the opposite of what I had intended).

So, it is settled. Wednesday's exam will be in traditional format, on paper.

However, please note that there is NO CHANGE for Friday's group quiz, which WILL use laptops. Since that has been the plan all quarter, and no one has objected, I have been presuming that every group has at least one laptop among the members. If that is not the case for a group, you need to let me know now (should have done so long ago, of course), in which case I will do a bit of group switching.

November 24, 10:13 a.m.

If you'd like to try executing the Chinese dialect code, I've placed a file fangyan.R in our Handouts/R2 directory. Load that code into R, and then run makecanmandfs() , no argument. It will create the data frames can8 and man8.

Note that the original data file already combines Cantonese and Mandarin, but I wrote code to separate them because in general usage they would be separate.

By the way (you need not read this part), I mentioned in class the error in the character for "up," which our handout has as tone 3 instead of the proper 4. Turns out that in the original data file (which I am not the author of), it listed tone 3 as an alternative reading, for some situations.

November 23, 7:06 p.m.

I have a question for all of you concerning next Wednesday's exam.

In contrast to the quizzes, in which I've been having you fill in blanks in my code, in the exam you will write full functions. My question to you then is, would you like to do this on a laptop?

If someone does not have access to a laptop or cannot borrow one, we of course cannot do things this way. But if that is not an issue, there could be real advantages. You could try out your code, check the online docs regarding arguments in built-in Python/R functions, and so on. For me, the advantage would be that you will e-mail me your code at the end of the exam, setting up partially automatic grading.

Note that is different from our group quiz on Friday of that week, since that only requires that each group has a laptop.

Note of course that the UCD Honor Code would forbid e-mail communication during the quiz and (if we choose to do things this way) in the exam.

November 22, 11:49 p.m.

I had the wrong link to the exemplar project, fixed now.

November 22, 3:41 p.m.

Project specs are ready, here.

November 21, 6:21 p.m.

My message in our CS faculty discussion yesterday is posted here.

November 17, 10:55 a.m.

I've placed an example usage of Part B here. The input file was the Davis map I gave earlier. My route started at La Rue and Russell, went to Russell and A, then down A to First.

As I noted before, to get decent resolution you need to make the R graphics window large. But that does require a machine with sufficient memory. In my example above, I chose a moderate-sized window (and got only moderate resolution), and yet it still was straining the resources of the machine I ran it on (an older PC). Note too that much will depend on whether you draw directly on the R graphics window, versus manipulating the pixmap object.

November 15, 5:17 p.m.

I've added more choice for you in your coding for Part B, in terms of how distance is defined.

Also, I've made explicit the motivation for the dls command. Of course, this is no change in the specs for the code.

November 14, 5:17 p.m.

Concerning Problem B:

November 12, 9:17 p.m.

In my November 9 posting, I gave a review of the term set, from ECS 20 or high school. Today I received another question on sets, so I'm beginning to wonder if this material is actually on ECS 20 or elsewhere. I'd really like some feedback: Has anyone not seen the basic notions of set, subset, union and intersection before? Please let me know. I notice, for instance, that Set Theory is the very first topic covered by Prof. Bai in ECS 20.

At any rate, the question today concerned the term union. I had said that that if numsets is list(10:15,14:18,c(8,88,888)), then the union of all the numbers in all the vectors in numsets is {8,10,11,...18,88,888}. Note that there are no duplicates--14 should NOT appear twice in the union, for instance. This is just the normal, ordinary, standard math meaning of union.

Please let me know if you need any elaboration on this at all; I'll be happy to walk you through it.

November 12, 12:37 a.m.

I hope everyone had a Happy 63 Day. :-)

Part B of the homework is now on the Web.

November 9, 9:52 p.m.

Many thanks to Jack for pointing out that I put the wrong graph up for the example in Problem A. Fixed now.

November 9, 9:03 p.m.

I've added an alternative version of the code discussed today, in the file minpair.R.

November 9, 8:27 p.m.

Sorry to bring up a somewhat sensitive subject: In grading the homework, Balaji has sought my advice in a few cases, asking if certain grades he planned to give seemed fair to me. I've responded, and in some instances have suggested alternatives. I think Balaji is being reasonable and equitable. If however you do feel you deserve a higher grade, please see me about it, rather than blaming him; I'll be happy to discuss it with you.

November 9, 4:27 p.m.

Hwk 4 news:

November 8, 4:21 p.m.

I've placed an example run and plot for Part A on the Web.

November 5, 4:21 p.m.

Part A of Homework 4 is now on the Web. It's rather easy. The material in Sec. 12.3.1 of The Art of R Programming in our official lecture notes packet contains all that you need to know about R classes, for this assignment.

November 4, 10:48 p.m.

As discussed in class today, there are two versions of Chapter 2 in the official lecture notes packet for the course. The one of importance to us is the first one, which has pagination 6-26, and which ends with a comment on the language REBOL. (BTW, the son of the inventor of REBOL is a CS student at UCD!)

In terms of Tests, you are responsible for that first Chapter 2, and for all the handouts, such as the one given out in class today. You will not be tested on the remainder of the above packet. However, you should find that the entire packet is useful in your homework, and you can draw upon it in Tests if you wish. Be sure to use the Table of Contents--and maybe your Homework 2. :-)

November 4, 6:33 p.m.

Remember the slogan, "When it doubt, try it out." R's interactive mode allows you to do little experiments to see how functions and constructs work. Try that on our handout code. Stepping through the code in a debugger is a good way to learn too.

November 4, 6:30 p.m.

I've added more comments to the various handout .R files.

November 4, 6:16 p.m.

No discussion section/quiz this coming Monday, November 7.

November 4, 2:16 p.m.

Our Exam ("midterm") will be on Wednesday, November 30.

November 3, 11:45 p.m.

As mentioned in class, much of our coverage of R will be in handouts distributed in class. If you miss class, the handouts will be here. The first batch is in the R1 subdirectory. Also, if you want to try running some of the functions, I've put them there too (though did you know you can cut-and-paste text from a PDF file using xpdf?).

November 1, 1:01 p.m.

In this post I will give a concrete example of how MapReduce actually works in practice. This may clarify for you how to simulate a MapReduce cloud in your homework.

As mentioned in the specs, the standard introductory example (the "HelloWorld.c") is word count. Text files are read in as input, and the output is a frequency count for each word. A nice concrete example in Python is the one by Michael Noll.

For this example, suppose our input consists of just one file, whose contents are

abc
de
f 
88 de
de f

The final output will be

(abc,1)
(de,3)
(f,2)
(88,1)

which tells us that the word 'abc' appeared once in the input, 'de' appeared three times, and so on. This output will be written to a user-specified file.

In the Noll tutorial, he had mapper and reducer programs named mapper.py and reducer.py. To run the job, he types

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-*streaming*.jar -file /home/hduser/mapper.py
-mapper /home/hduser/mapper.py -file /home/hduser/reducer.py -reducer
/home/hduser/reducer.py -input /user/hduser/gutenberg/* -output
/user/hduser/gutenberg-output

The Hadoop runtime executable here is a program /bin/hadoop. The mapper and reducer files are specified, as well as the locations of the input and output files, etc. The program /bin/hadoop handles all the MapReduce ops, i.e. feeding the files into the mapper program, writing the output of the reducer program to a file (or possibly several files). Note by the way that Hadoop has its own distributed file system HFS, so regular files in your OS need to be imported and exported to HFS.

(By the way, you can run this on CSIF if you are curious; go to the CSIF FAQ page for instructions.)

NOTE CAREFULLY: All this constitutes just ONE JOB. The user here ran /bin/hadoop just ONCE, for this ONE job. Various (key,value) pairs will be generated during the execution of this job, but they are all part of this ONE job. An individual (key,value) pair is NOT a job. So in your homework, one job again means one execution of /bin/hadoop.

So, here is what happens:

October 31, 8:08 p.m.

Starting with the next assignment--meaning all programs using R--you will no longer bound by the rule that you cannot use material from future chapters. My lecture method will consist of presenting examples of R code, distributed as handouts, with the actual book playing a supporting role. You will be allowed to use any R functions or constructs.

October 30, 6:28 p.m.

Someone may be spoofing my e-mail addressing. If you get a request for donuts, it didn't come from me. :-)

October 29, 12:57 p.m.

I've posted check values for B.

October 27, 9:45 a.m.

If you were not in class yesterday, the handout is here. Make a hard copy and bring it to Tests.

October 26, 3:38 p.m.

I've moved back the due dates of both parts of the assignment.

October 24, 9:45 p.m.

As you know, I'm very big on the use of debugging tools. Don't debug by inserting printf(), cout etc. statements!

I must say, though, that I've long felt that in the case of discrete-event simulation, one does need some print statements, as a supplement to--not instead of!--using a debugging tool. The time-passage and multiple "simultaneous" events nature of DES makes such a supplement very valuable.

For example, for our current assignment, my implementation of the program includes the following output in debugging mode:

...
3.79726267155 job 7 starts, 3 mapnodes, 1 kvs/node
3.79726267155 1 nodes free
3.82534530203 job 2 reduce done
3.82534530203 2 nodes free
3.84246338627 job 8 starts, 1 mapnodes, 4 kvs/node
3.84246338627 0 nodes free
4.44481206057 job 9 requests 4 nodes, declined
4.51364310184 one mapper node for job 7 done
4.51364310184 1 nodes free
4.67631300793 job 3 reduce done
4.67631300793 2 nodes free
4.79097182021 one mapper node for job 6 done
4.79097182021 3 nodes free
4.85234797049 one mapper node for job 7 done
4.85234797049 4 nodes free
...

(The first number is current simulated time.) I pipe that output through more and sometimes grep, e.g.

% python Cloud.py | grep "job 1 "
0.624711764744 job 1 starts, 2 mapnodes, 1 kvs/node
1.55764803275 one mapper node for job 1 done
1.68679025282 one mapper node for job 1 done
2.99666543796 job 1 reduce done
October 24, 12:45 p.m.

Recall our rule (Sept. 28, below):

In your Homework, please do not use major constructs from later chapters or which are not in the book at all.

So, you are not allowed to use threads in Problem A. Use what we have learned so far.

October 23, 2:18 p.m.

I've made a couple of clarifications on Problem B, and made a small substantitive change (regarding when a node becomes free).

October 20, 2:18 p.m.

Part B of the homework is now on the Web, as are the due dates.

October 19, 9:08 p.m.

Hwk 3 will consist of two programming problems, labeled A and B. Problem A is now on the Web.

October 12, 10:03 p.m.

Several people have asked me whether the nonprintable characters count when moving left or right. It really doesn't matter, up to you.

October 12, 3:34 p.m.

The signup for grading is here.

October 11, 9:34 p.m.

The key for down is d, not v.

October 9, 6:04 p.m.

When you type, say

% mv x.c y.c
% vi y.c

who created that file x.c (renamed to y.c)? Did the authors of the mv and vim programs do this? No, you did, right?

October 8, 11:44 p.m.

My R debugger is now ready, here.

See here for usage and help/intro command.

October 8, 5:43 p.m.:

Homework 2 will be due on Friday, October 14.

October 8, 3:34 p.m.:

Aiyaa!!!! Another misquote!

Please note:

I did quote HMC Pres. Maria Klawe as making the gender-dependent claim above, but I do NOT agree with it. I did say that I think anyone, regardless of gender, should use Python (or similar language) to learn programming, and that I wished ECS 30/40/60 would use Python. I also said that I think people should do most of their programming in a language like Python, other than cases in which speed is an issue.

October 8, 12:21 p.m.:

I just added a statement, "The program /usr//bin/pdftotext on CSIF defines our standard" to the homework specs.

October 7, 8:21 p.m.:

For the record: In discussing the BusinessWeek article on Harvey Mudd College's efforts to increase the number of female CS majors, I quoted HMC PRES. KLAWE as saying that the simpler nature of Python relative to C/C++ would attract more women into the field. That was not MY opinion, folks, and I said that I thought everyone, regardless of gender, should learn programming in a Python environment rather than C/C++, and indeed should use Python for most programming that doesn't require speed.

October 6, 4:51 p.m.:

As Gabriel noted in class, you can indeed use map() with more than one data argument. For instance:

>>> map(max,(5,12,13),range(7,10))
[7, 12, 13]

Here I used Python's built-in max() function, but of course you can write your own functions to use in map(). And the function need not be scalar-valued. For example:

>>> map(lambda u,v: (min(u,v),max(u,v)),(5,12,13),range(7,10))
[(5, 7), (8, 12), (9, 13)]
October 5, 10:38 p.m.:

Balaji's office hours are TuTh 11:30-1.

October 5, 9:38 p.m.:

In Homework 2, the user does NOT hit the Enter key.

October 5, 6:28 p.m.:

Homework 2 is ready.

You'll find that this is a rather long and detailed program, and will be glad you're working in groups! You also must learn some new material on your own. So, start now! I haven't set a due date yet, and it won't be earlier than October 12, but probably not long aferward.

October 5, 2:14 p.m.:

A question was raised in class as to whether the nested list comprehension on p.43 actually works. Well, it does; I just checked. Of course, you do have to define y first. :-) In the presentation on that page, I didn't show y being defined, but it was implied, because I simply typed

>>> y

and the Python interpreter printed out the value, as you see on that page; printing implies that y must have been defined previously, which it was.

The other point raised was whether the pseudocode was correct; it wasn't. It should read b{1:] instead of b[1]. I've fixed this in the revision.

Finally, I just looked out of curiosity at the official Python documentation, which makes the amusing remark, "If you’ve got the stomach for it, list comprehensions can be nested"; my sentiments exactly. :-)

October 5, 9:59 a.m.:

The R language is of course fundamentally numerical, a good example being that it has a built-in matrix data type. In addition, matrices will likely come into play in our use of Python as well.

Make sure you know the basic matrix operations. If you need to review them, or have not been exposed to them before, read Appendix A in the ECS 132 textbook, pp.385-389. This material may appear on Tests, including the one next Monday.

October 3, 9:24 p.m.:

The BusinessWeek article on the surge in female CS majors at HMC is here. There was an article a couple of years ago on the topic in the NYT, here, showing that in the past the percentage of women in the field was indeed higher. The latter article blames the emergence of (male-dominant) computer gaming as the reason for the decline, but I don't think so. Instead, as I said in class I think women tend to be more practical than men, and are put off by the feast-and-famine cycle of the field. There have been "famines" about every 10 years--early 80s, early 90s, early 2000s--and that last one kind of cemented the perception.

October 2, 10:18 a.m.:

In class on Friday, there was considerable discussion regarding derived classes on p.25. The following example might clarify things:

class a:
   def __init__(self):
      self.x = 8
   def f(self):
      self.x += 1

class b(a):
   def __init__(self):
      self.y = 2
   def g(self):
      self.y += 6
      a.__init__(self)

def main():
   binst = b()
   print dir(binst)
   binst.g()
   print dir(binst)
   pass

if __name__ == '__main__': main()

(The built-in dir() function tells us what's in any specified object. See p.45.)

Running this code, the output is

% python inher.py
['__doc__', '__init__', '__module__', 'f', 'g', 'y']
['__doc__', '__init__', '__module__', 'f', 'g', 'x', 'y']

Note that x from the parent class was not present until we called __init()__ in the parent class (though the method f() was present).

As mentioned, class instances are implemented as dictionaries. Recall that we can add elements to dictionaries at any time, which implies that we can add member variables to class instances at any time, not just at the time the instance is created. Actually, at the time we call __init()__ in a class, the instance has already been created (which is why some people don't like referring to __init()__ as a constructor); __init()__ then just adds to the dictionary.

Now, in an inheritance situation, when we call __init()__ in the parent class, with the argument self, the latter is pointing to the dictionary of the child class instance. Then the parent class' __init()__ adds to that dictionary.

By the way, __init()__ is a built-in method to any class, but just a stub. If we define it ourselves, which is usually the case, we override that stub. (See p.48 for an example in which we don't do so.)

September 29, 8:28 p.m.:

Folllowup to my posting of 1:52 p.m. today: Conversely, on a Test you are not allowed to use material not yet covered in class.

September 29, 5:20 p.m.:

Just fixed another error in the sample output. "Haste makes waste," sorry.

September 29, 4:10 p.m.:

Just fixed an error in the sample output.

September 29, 1:52 p.m.:

Tests always cover all material through the most recent lecture or quiz, including all reading through the latest page covered in lecture.

September 29, 1:49 p.m.:

I've added to the getpatts() example in the Homework, showing what bit patterns would be found.

I've also added an exclamation mark. :-)

Finally, I've offered an Extra Credit version of the problem. It's not harder than the original (it may be easier), and if you've already done the original, you can convert it with some easy changes.

September 28, 11:11 p.m.:

The new Homework turn-in procedures are now on the Web, in our Homework directory.

September 28, 2:52 p.m.:

Please keep in mind that, as noted in the Homework specs, the bps class is supposed to be general. We are using it to store the output of getpatts(), but conceivably it could be used to store output of various other functions too.

One implication of this is that the bps examples in the specs aren't supposed to be connected to the getpatts() example.

BTW, I noticed that my c vector scheme is not as general as it should be, because it doesn't allow for circuits with noncontiguous bit indices, e.g. b5 b12. But you have enough to do as it is, so I won't change it.

September 28, 1:52 p.m.:

In your Homework, please do not use major constructs from later chapters or which are not in the book at all.

September 27, 1:48 p.m.:

The deadline for getting into a group, either forming one on your own or asking the TA to place you in one, was last night. If you are not in a group yet, you may be dropped from the course. You may have an e-mail message to this effect. Please check. Sorry, but we have a substantial waiting list for the course.

September 26, 8:40 p.m.:

In order to aovid having to dealing with a names clash(), I've changed the names of the bps methods.

September 26, 2:49 p.m.:

I've changed my office hours to M 3-4, W 2:30-3:30.

September 26, 2:46 p.m.:

I fixed a typo and clarified a couple of points in the Homework.

September 26, 2:31 p.m.:

Please note that you are required to have a hardcopy of the textbooks. The texts are used heavily in Tests, and you are not allowed to use any electronic devices during a Test. Also, during lecture you should be annotating your text with extemperanoeus comments made by the professor.

As noted in the Syllabus, please do not use laptops during lectures, except in "emergencies." Actually, using tablets is OK (during lecture, not during Tests), as long as you do so quietly.

September 26, 10:49 a.m.:

In our Python book, the last chapter is on debugging, a hugely important topic. Consider it assigned reading to be done now. It may show up on Tests, but the main reason for reading it NOW is to save yourself time in debugging!

September 26, 10:43 a.m.:

I will sometimes be adding files to the directory Handouts/ on our Web page, consisting of various short code snippets. These may be cited on Tests. Make sure to make hard copies and bring them to Tests. You need not constantly watch this directory for new additions; I will announce them. Right now the files are DeleteHyphensSpaces.pdf, FinalGrades.pdf and RemovePlusesForFirefox.pdf

September 25, 11:39 p.m.:

In my explanation of the drawbacks of having arrays that can be shrunken or expanded, I oversimplified the situation a bit. Gabriel found a good Web site, here, that shows the current status of CPython in this regard.

Basically, each time you expand or shrink a Python list, you incur a performance penalty as the list is rearranged internally. However, in between rearrangements, the list is indeed stored as a contiguous C array with direct access to individual elements.

Actually, one of my published research projects exploited this fact; the paper is here. But I didn't fully think of its implications in class on Friday.

September 25, 11:21 p.m.:

Homework 1 is ready, in our Homework directory.

September 24, 9:21 a.m.:

Monday's discussion section will be devoted to forming your Groups. (There will be no Quiz.) If you already have formed a Group, you can inform our TA, Balaji, at that time, or by e-mail if it's not convenient to attend that day (see our class Web page). Please note that if you are not enrolled, you should not join a Group at this time; you can join a Group in progress if you get in to the class later.