Blog, ECS 158, Fall 2017

Wednesday, December 6, 4:25 pm

I just sent out your complete record of quiz grades, including Quiz 8, today's group quiz. Let me know by this Friday, 11:59 pm which two regular quizzes, if any you wish to apply Quizzes 0 and 100 to.

Sunday, December 3, 10:20 am

I just e-mailed a message to everyone, stating that everyone must review the blog post of Thursday, November 23, 10:20 am, concerning course grades. I also e-mailed your cumulative quiz records.

Friday, December 1, 11:45 pm

IMPORTANT: In our Group Quiz next Wednesday, use the following for your e-mail address when signing on to OMSI. It will be the same as the naming scheme you've been using to submit your homework, but without the .tar and also without

Wednesday, November 29, 11:05 pm

Quiz 7 went well, I believe. Though there were a few grades that were well below what I believe the students are capable of, in general the results were good. There were even two grades of 100 (an A+, of course).

One point I might make concerns Problem 3. It was not a deep problem at all (see the solutions file on our class Web page). As I mentioned in class today, the math is important; many of the applications of parallel computation are mathematical in nature.

In this particular case, the Fibonacci number example shows a common pattern: Transform a problem that for which it is not clear how to parallelize into one that we already know how to parallelize (matrix multiply).

Wednesday, November 29, 8:10 pm

Looks like I misread the calendar. The last day of instruction this quarter is Friday, Dec. 8, not Wednesday, Dec. 6.

However, I will keep to the same plan. It's too late to change, and some people may have made plans according to the Dec. 6 date in our class syllabus.

So, we'll have the Group Quiz in lecture on Dec. 6, with no discussion that day, and no class on Dec. 8. Instead of class on Dec. 8, I will hold a special office hour during class time (in my office), so that you can consult with me regarding your term projects.

Wednesday, November 29, 6:30 pm

Just to make sure we are all on the same page:

Monday, November 27, 11:15 pm

The Term Project is now ready!

This is a different kind of activity, one that will take quite a bit of digging on your part. It should NOT be put off.

Sunday, November 26, 11:05 pm

The coding problem (Problem 4, as in previous questions) will involve R, again with matrices. There will be a couple of problems on Thrust, but not involving the writing of code.

In the matrix algebra review at the end of our book, the last section is on matrix algebra in R. Be sure to know this material.

Sunday, November 26, 10:55 pm

Please note that in R, if you extract a submatrix that ends up consisting of a single row, the result will be a vector rather than a matrix, unless one specifies drop = FALSE:

> x <- rbind(3:5,c(6,2,9),c(5,12,13))
> class(x[1,])
[1] "numeric"
> class(x[1,,drop=FALSE])
[1] "matrix"
Saturday, November 25, 8:45 am

Turns out that Linus Torvalds, the inventor of Linux, dislikes C++ as much as I do (actually a lot more). :-)

Thursday, November 23, 4:00 pm

Sachin mentioned to me that some students brought up the following issue with OMSI. If one has a runtime error and is trying to determine the cause by adding print() calls, OMSI just reports the runtime error and the output of the print doesn't appear.

Actually, the solution is simple. Just add a return() call! Here is an example from the last quiz:

   evs <- sapply(tmp,endValue)
   base <- evs[1]
   for (i in 2:nWorkers) {
      tmp[[i]] <- tmp[[i]] + bise
      base <- base + evs[i]

Here the user has accidentally spelled 'base' as 'bise'. When the code is run, a runtime error occurs, and evs is not printed out. But if we insert a return() call,

   evs <- sapply(tmp,endValue)
   base <- evs[1]
   for (i in 2:nWorkers) {
      tmp[[i]] <- tmp[[i]] + bise
      base <- base + evs[i]

we get the desired printed material.

Thursday, November 23, 10:20 am

An eighth-week assessment:

As I have said in class a few times, you ALL have learned a ton about parallel computation. You've done this on the major platforms -- OpenMP, CUDA, MPI -- and if you ever need to do parallel computation you will have a good foundation for that.

Due to the university's error in the ECS 150 prerequisite and some other factors, some students entered the class with a weaker programming background than others. (The 150 prerequisite is not just for systems knowledge, but also to ensure that students enter 158 with sophisticated programming experience.) As you know, I am compensating for that by a generous grading policy, which is:

Note too my general policy that students who get a good grade on the term project will get a bonus boost to their course grades.

Thursday, November 23, 9:55 am

Happy Thanksgiving, everyone!

Various pieces of news on the quiz yesterday, which I have named Quiz 100 due to its optional nature:

Wednesday, November 22, 3:25 pm

A sharp student asked me the following concerning the matrix powers example today for matrix inversion. He noted that I had said that both matrix multiplication and matrix inversion have time complexity O(n^3), so what does this method buy us? The answer is that matrix multiplication is much more easily parallelized than matrix inversion.

Sunday, November 19, 9:45 pm

PLEASE NOTE CAREFULLY: For this coming Wednesday's Make-up exam, you MUST use OMSI Version 1.2.0 in order to get credit. It is now ready for download.

Saturday, November 18, 10:45 am

In order to be ready for our Make-up Quiz next Wednesday, make sure you have full understanding of the solution to Problem 4, Quiz 6.

Also, please familiarize yourself with the workhorse R functions, apply(), lapply(), sapply() and split():

> m <- matrix(sample(1:5,12,replace=TRUE),ncol=2)
> m
     [,1] [,2]
[1,]    2    1
[2,]    2    5
[3,]    5    4
[4,]    5    1
[5,]    2    1
[6,]    1    3
# call sum() on each row
> apply(m,1,sum)
[1] 3 7 9 6 3 4
# call sum() on each column
> apply(m,2,sum)
[1] 17 15
> f <- function(x) sum(x[x >= 4])
# call f() on each row
> apply(m,1,f)
[1] 0 5 9 5 0 0
> l <- list(x = 5, y = 12, z = 13)
# apply the given funciton to each element of l, producing a new list
> lapply(l,function(a) a+1)
[1] 6

[1] 13

[1] 14
# group the first column of m by the second
> sout <- split(m[,1],m[,2])
> sout
[1] 2 5 2

[1] 1

[1] 5

[1] 2
# find the size of each group, by applying the length() function
> lapply(sout,length)
[1] 3

[1] 1

[1] 1

[1] 1
# like lapply(), but sapply() attempts to make vector output
> sapply(sout,length)
1 3 4 5 
3 1 1 1 
Saturday, November 18, 9:05 am

An alert student pointed out to me that the second call to _syncthreads() in Sec. 5.10 should not be there.

Wednesday, November 15, 11:25 pm

Grades for Quiz 6 are being sent out as I write this. Generally pretty good, though most people had trouble with Problem 4.

Wednesday, November 15, 8:35 am

I will need to shorten my office hours slightly, changing from W 1:30-3:30 to W 2:00-3:30. If this is any inconvenience, I can make a special appointment for you, as I have done with a couple of students.

Tuesday, November 14, 11:50 pm

Concerning next week's Make-up Quiz:

Monday, November 13, 5:00 pm

As noted in our textbook, in order to use CUDA on CSIF or any machine, you need to be sure that you have set your path variables correctly. In the case of CSIF, CUDA is installed in /usr/local/cuda.

Note that this is standard Unix add-on application practice, to install in /usr/local.

Sunday, November 12, 12:15 pm

The coverage for our quiz this Wednesday, Quiz 6, will be through Section 5.10, i.e. through p.139. Please keep in mind these blog posts:

Saturday, November 11, 3:15 pm

Hwk III is ready on the Web!

Saturday, November 11, 12:05 am

I have decided to hold another optional Make-up Quiz on Nov. 22, during the discussion section; there will be no regular quiz that day. It will have three purposes:

Friday, November 10, 9:25 pm

Don't forget your R! It will continue to be used as the vehicle for quiz questions occasionally for the rest of the course.

Friday, November 10, 9:25 pm

As explained in a coming page in our book, debugging facilities for CUDA are poor, unless you have a dedicated, monitorless machine. However, you can use printf().

Friday, November 10, 1:50 pm

Some of the comments in the example in Section 5.12 are not very helpful. I've placed a better version in the file on our Web site.

Thursday, November 9, 11:25 pm

I mentioned at the start of the quarter that many of the quiz problems will be variants of examples in the book. In Quiz 5, this was the case for all of the problems except #1.

And I know of at least one student who correctly guessed that I would have a problem like #2. So he was all set for it, and there were probably others.

The point, of course, is NOT to see who can guess my thoughts! :-) Instead, the point is that by thinking of how you might work variants of the examples in the book, you gain much deeper insight into those examples.

But be careful when you see a "variant question" on a quiz. If the problem statement says, "Change aspects A and B," it does NOT mean that all other aspects change too. In Problem 3 in Quiz 5, it said to relax the assumption of a unique root. But it did NOT say to abandon the other assumptions, such as that f(a) is negative and f(b) is positive and so on.

Thursday, November 9, 2:25 pm

The adjacency matrix example (Sec. 4.14 of our text and Problem 4, Quiz 5) involves a number of important issues in general parallel computation. It is likely that Quiz 6 will contain a related problem.

Thursday, November 9, 1:25 pm

The following is not a major point, but worth mentioning in the spirit of clear thinking/speaking.

In Problem 3, Quiz 5, several students said that in the original code there are nth * niters iterations. That would be like saying that a sports team of p players in an g-game season play p X g games. :-)

Wednesday, November 8, 10:55 pm

Please pay close attention to the do's and dont's listed in my October 5 blog posting, one of which was to save your work to the server (Submit button) frequently. The reason is that if everyone tries to submit at the time the quiz ends, the network will become slow.

Wednesday, November 8, 10:50 pm

I am sending out the results of Quiz 5 as I write this. This was a tough quiz, I think.

On Problem 2, a number of students were unaware of the fact that we are not trying to find all the roots; see p.103, top.

Problem 3 is actually very subtle, possibly the most difficult problem we've had in all our quizzes so far. But about a dozen of you got it right, which was nice to see.

I'm sure that in Problem 4, a lot of students simply ran out of time.

Tuesday, November 7, 11:45 pm

If I had previously regraded your Quiz 2, Question 4 and the new score did not appear in my cumulative report yesterday, please let me know.

Monday, November 6, 8:20 pm

This Wednesday's quiz will be mostly on previous material, not GPU. But be prepared for at least one GPU question.

Monday, November 6, 8:15 pm

As I write this, I am mailing out your quiz records thus far.

Monday, November 6, 7:50 pm

The vast majority of students earned an A grade on today's quiz, just as I hoped, and very good to see.

Monday, November 6, 1:20 pm

Quiz news:

Saturday, November 4, 9:10 am

Once again: Make SURE you have Version 1.1.3 of OMSI, not earlier or later. It is the one currently on GitHub. View the VERSION file to check.

This version does not have the scrollbar feature that Sachin had added in a later version. So, you need to use up- and down-arrow keys to scroll on Linux or Windows; on a Mac, the two-finger trackpad gesture for scrolling does work (pointed out by Matt in our class).

Friday, November 3, 10:45 pm

As mentioned, Monday's Make-up Quiz will consist of just one problem, which will be very similar to Question 4 in Quiz 4. Thus you should make absolutely sure that you fully understand the solution to that question, and I would recommend that you make a hard copy of the solution and bring it to the quiz.

Friday, November 3, 9:15 pm

Please note again that on Monday's Make-up Quiz, we will use Version 1.1.3 of OMSI. DO NOT USE A LATER OR EARLIER VERSION!

Thursday, November 2, 10:10 pm

I'm sending out Quiz 4 grades as I write this. These are for the ones submitted either to CSIF or on USB keys; those who submitted on paper will unfortunately need to wait a few days.

Overall, the results in Questions 1-3 were pretty good, though not in the case of a few students.

The major stumbling block was Question 4. It was common for students to get the code right, e.g. for memcpy(), but do it in a nonparallel manner! :-( Note carefuly that

#pragma omp parallel

simply launches the threads, and every line in the block is executed by every thread. In other words, without having a line

#pragma omp for

before the for loop, each thread will process ALL values of row. Because so few students did this, I gave a bonus to those who had it, consisting of a one-notch bump upward, e.g. B to B+.

The grading scale was pretty liberal, but I believe the proper one.

Thursday, November 2, 12:30 am

ANNOUNCEMENT: Make-up quiz, Monday, Nov. 6!

I will be giving a make-up quiz in class (lecture period, 12:10-1) this coming Monday, November 6. It will have two goals:

The quiz will consist of a single question, a variant of Question 4 on Quiz 4. It will be simple enough that anyone who reviews the answer to Question 4, Quiz 4, will easily get at least a B, very likely an A.

At the end of the quarter, you will have the option of replacing any of your quiz grades by your Make-up Quiz grade (or not replacing any of them, if you wish).

REMEMBER TO BRING YOUR LAPTOPS ON MONDAY! (With OMSI 1.1.3 installed, which is the version now on GitHub.)

We will still have our regular quiz on Wednesday.

Wednesday, November 1, 10:20 pm

Lots of problems with OMSI today. Very strange, since Sachin and I both tested it (Linux and Windows for him, Linux and Mac for me), using both the new and old client. We had no issues at all. All the problems encountered by the students seemed to have occurred with those using the new client. (The new server is the same as the old one.) Sachin even tested it during the quiz, with no problem, and I just now tested it once again, no problem.

The bug will be very hard to track down. For the time being, please use the Version 1.1.3 client until further notice.

Deeply sorry for all this. You deserve better.

Concerning your performance on the quiz itself, I've done a spot check, and it looks pretty good, except for Problem 4. Please see the solutions and let me know if you felt there were ambiguities in the problem statement.

Tuesday, October 31, 10:40 pm

A couple of students have been placing pull requests for OMSI on GitHub. It is very highly appreciated that students wish to contribute to OMSI, but this is much better done in conjunction with Sachin and me, rather than independently working on enhancements that may either not be useful or have low priority.

If you are interested in contributing to OMSI, I do have a couple of enhancements in mind. Please come see me about it.

Monday, October 30, 10:55 pm

With the tentative version I have of Quiz 4, I would recommend reading the OMP example of in-place matrix transpose especially carefully. Also, you should review/learn the memcpy() function.

Monday, October 30, 8:35 pm Here is an illustration of how the OMP atomic clause works. I took the "hello world" example I cited on the Web (and then modified in Quiz 1), and added code to find the sum of all the thread numbers:
// helloOMP.c

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[]) {
   int nthreads, tid;
   int sumalli=0;
   #pragma omp parallel private(nthreads, tid)
     tid = omp_get_thread_num();
     #pragma omp atomic
     sumalli += tid;
     printf("Hello World from thread = %d\n", tid);
     #pragma omp single 
       nthreads = omp_get_num_threads();
       printf("Number of threads = %d\n", nthreads);
   printf("sum of all tid values = %d\n",sumalli);

Note the use of atomic. After compiling and running to check the validity, I recompiled using the -S option in gcc. This produces the assembly language file helloOMP.s, which is the compiled code. Here is the relevant section:

        call    omp_get_thread_num
        movl    %eax, -16(%rbp)
        movq    -24(%rbp), %rax
        movq    (%rax), %rax
        movl    -16(%rbp), %edx
        lock addl       %edx, (%rax)

On Intel machines, the compiler generally places return values in the EAX register. So tid is there. It is also copied to a place 16 bytes from the base pointer register RBP. Right next to it, 8 bytes away, is the place the compiler chose to store the address of sumalli. The second-to-last line places tid in the EDX register, and the last line adds that value to sumalli -- atomically, due to the lock prefix.

Sunday, October 29, 10:55 pm

I do urge you to do the Extra Credit problem in the current assignment. It is actually fairly easy once you learn how to use SEXPs. These are R objects. The term means "S expressions," alluding to the fact that R is an open-source version of s.)

There are many examples on the Web, but here are the comments I placed in my writing the code for the problem. They serve as a mini-tutorial which, together with examples on the Web, should make things easy.

// C/OpenMP version of 'rollmean' in the 'zoo' library, RollMean.c

// uses SEXP macros directly, not with Rcpp wrapper

// R CMD SHLIB uses a Makefile, so write a Makevars file, one line:
// PKG_CFLAGS= -fopenmp

// run from shell: rm *.o *.so; R CMD SHLIB RollMean.c

// from R, execute test:

// dyn.load('') 
// x <- runif(20) 
// k <- as.integer(3)
// .Call('rollmean',k,x) 

// SEXP ("S expression") means an R object; need to convert back and
// forth between SEXPs and C; key macros:

// REAL(w) forms a C pointer to the numeric values in the R vector w
// INTEGER(z) forms a C pointer to the numeric values in the R vector z
// allocVector(SEXPtype) forms an R vector of length m of the given
//    type, e.g. REALSXP
// PROTECT() protects R objects from being garbage-collected from C
// UNPROTECT() ends that protection
Sunday, October 29, 10:50 pm

Starting with Version 1.3.0, OMSI is offering an External Editor option. Under this option, you still use OMSI for compiling, submitting, running and so on, but you will do the actual editing using an external text editor. This gives you access to syntax coloring, autoindent, undo/redo and so on. See the file for details.

Sunday, October 29, 10:45 pm

I believe I have tracked down the reason why some students had display rendering problems with OMSI: There was a problem with the file Questions.txt that contains the exam questions.

Suspecting that there were odd, invisible characters in the file that Sachin used in the quiz, I inspected it using the Unix od command. It turned out to have a mixture of Unix and Windows end-of-line characters (hex 0a vs. hex 0d0a, respectively). It's not clear how that odd mixture came about, maybe somehow due to Sachin testing on both Linux and Windows, but apparently it ended up just confusing Macs. Cleaning up the file resulted in clean rendering.

Saturday, October 28, 11:45 pm

Now that Problem 2 is Extra Credit, you do not have to stick with Rcpp. You can directly use .Call() with SEXP etc., but you will still have to search the Web for examples and procedures, such as this site. Form a file Makevars with contents

PKG_CFLAGS= -fopenmp

before running

R CMD SHLIB yoursrc.c

I'll try to post a detailed example tomorrow, though since it is Extra Credit this will be lower priority.

Saturday, October 28, 2:30 pm

A super-alert student asked me about a seeming anomaly on pp.22ff of our textbook. In the line

clusterExport (c2 ,”b”)

why are we exporting b but not a? It does actually work if you run the code.

The short answer is that it was an error, which was corrected in the full edition of the book. But there is actually more to the story. Why does the apparently erroneous code still work? This is subtle: Consider the line

grpmul <- function(grp) u[grp,] %*% v

Now, remember, in R every operation is a function! That includes the array indexing operation, [. So the above is really

grpmul <- function(grp) "["(u,grp,) %*% v

So, there is a function call within the function grpmul(). And, that call won't be made until we call grpmul() -- which we do from the manager, when we call clusterApply(). But...that means that the call to grpmul() will be done at the manager, not the workers, and since the manager does have the matrix a, the whole thing works correctly without exporting a.

Saturday, October 28, 2:00 pm

A student whom I know to be very systems-savvy is having trouble getting Rcpp to work. Rcpp can indeed be very finicky, and if this student is having trouble, I believe many others will have even more trouble.

Thus I have decided to make Problem 2 Extra Credit.

Saturday, October 28, 9:00 am

On our class Web site, I have a full example of Rcpp used in conjunction with OpenMP. It is an Rcpp version of the example in Sec. 4.13 of our book, and is an excerpt of my book, Parallel Computation for Data Science, CRC, 2015.

Friday, October 27, 9:40 pm

In using Rcpp, please do a direct compile, using R CMD SHLIB, as shown for instance in this blog post. Do not use sourceCpp().

Friday, October 27, 2:10 pm

If you encounter a code problem, e.g. odd execution error, please send me a complete record of what happened, using the Unix script command to record your shell session. In the session, first run printenv so I can see your environment, then run cat so I can see your code, then compile and run, so I can see any compiler errors or runtime errors. Send me the typescript file.

Please do NOT send me screenshots, as I can't use them to re-run the code etc.

Friday, October 27, 1:35 pm

A couple of students reminded me that Rcpp requires your source code file to have a .cpp suffix even if the code is straight C.

Thursday, October 26, 3:00 pm

I've already had 3 students who contacted me because they had not been sent their quiz score, with it turning out that in each case they had the wrong e-mail address. One of them, for instance, wrote 'ucdavis,edu' instead of ''". Please be careful.

Thursday, October 26, 12:00 am

I sent out the Quiz 3 results a few minutes ago. As usual, if you submitted on paper, it will be a while before we can get the results back to you.

There are still some students who are doing below what they might have done with better background. I am still concerned about the fact that, due to the university's error, people were allowed to enroll in the class without ECS 150, or in some cases, even without a strong programming background. But since it is the university's error, not that of the students, I feel a responsibility to adapt.

Thus I am making the following policy change:

Instead of the original 70%/30% split in course grade between quizzes and homework, I am changing it to 55%/45%.

Note, however, that in order to take advantage of this, you will need to be highly cognizant of the homework grading procedures. Please read the class syllabus carefully on this.

Wednesday, October 25, 9:00 pm

OK, Question 4 regraded now. Again, sorry for the confusion.

Note that starting with Quiz 2, the solutions will be in the files OldExams/F17* on our Web site.

Wednesday, October 25, 6:25 pm

I will be regrading Problem 4 of Quiz 2. Sorry for the inconvenience.

Wednesday, October 25, 4:05 pm

Of course, those were Quiz 2 results, not Quiz 3.

Wednesday, October 25, 3:45 pm

I'm sending out your Quiz 2 grades as I write this. Overall, they were better than Quiz 1, but some people are having a slow start. Remember that your lowest 3 quiz (letter) grades will be discarded.

Tuesday, October 24, 8:45 pm

It may be helpful to review the bsort.c OMP example in Chapter 1 before tomorrow's quiz.

Sunday, October 22, 9:55 pm

Continuing that last post, note that the LOCK prefix locks the bus for the entire duration of the instruction. Now consider

lock addl $3, x

which adds 3 to the memory location named x. There is one bus transaction needed to read the old value of x, and a second one to write the new, incremented value back to x. So, we are locking the bus for a rather long time, potentially compromising performance when other threads want to access memory, but the benefits can be huge compared to using software locks.

Sunday, October 22, 9:55 pm

Near the top of p.52, the '??' means a missing reference, arising from the fact that we are using an abridged version of my full text and the reference is not in the short version. Here is the point:

For instance, say we are maintaining some global count, {\bf total}. Instead of

we could do
lock inc total
without software locks! By the way, that would best be done using inline code, a term meaning plunking some assembly language right in the middle of C/C++ code. Most modern compilers allow this.

Sunday, October 22, 3:35 pm

Solutions are on our Web site.

Sunday, October 22, 3:25 pm

Your Quiz 1 grades are being e-mailed to you as I write this. Again, very sorry for the long delay.

As you will recall, after Quiz 1 I did a spot check of the students in the class whom I know to be quite sharp, and the results were poor. This was chiefly due to a "credibility problem" on my part: I had said repeatedly the material that comes up in lecture, typically on the spur of the moment, may pop up on quizzes, but unfortunately some students didn't realize that I actually meant it. :-)

Nevertheless, in grading the quizzes just now, I was pleased to see that a number of students actually did quite well. There were a couple of 100s, and a goodly number in the 90s.

If you did not do well on Quiz 1, don't be demoralized by it. Remember, my usual policy is to drop your two lowest quiz grades, and I change that number of three. I am sure you will do well on future quizzes.

Still need to grade Quiz 2, very soon I hope.

Sunday, October 22, 12:05 pm

I just discovered that CSIF has switched MPI implementations. They had been using MPICH2 but are now using Open MPI (not to be confused with OpenMP). But you only need to modify my MPICH2 instructions slightly. Just replace

mpiexec -f hosts3 -n 3 prp 100 0


mpiexec --hostfile hosts3 prp 100 0

Another issue, though, is passwordless login. MPI has the manager node ssh to the worker nodes. You don't want to have to type in your possword each time this is done, so you need to configure your ssh startup directory ~/.ssh to allow passwordless login. CSIF has instructions for this.

Saturday, October 21, 5:15 pm

I also added more complete information in the box that arises in case of compiler error.

Saturday, October 21, 5:00 pm

Due to some excellent detective work by Sachin, OMSI now has scrollbars, in both the question and answer boxes. This is really convenient.

An added bonus is that mouse delete-region now works on Macs. The boundary lines won't show up, but you can now use the mouse to delete a block of lines, rather than having to repeatedly hit the Delete key.

You need not upgrade, but it would be worthwhile to do so.

Thursday, October 19, 11:00 pm

Sorry again for the delay in grading your quizzes. The least I can do is give some relevant feedback.

Note that, as with any exam, you can get partial credit even if your code doesn't run. In Quiz 2's problem on the pthreads example in the book, for instance, I will be looking to see whether you added in a for loop, so as to process the several values of nextbase. That problem was just of the "pencil and paper" variety, so you could not try to compile and run it, but even in problems where you can compile/run, you can still get partial credit even if it does not run.

The big advantage of being able to compile/run, of course, is that it gives you instant feedback. If the results come out wrong, you can go back and fix your code, resulting in a better grade.

Thursday, October 19, 11:00 pm

Homework II is now on our Web site.

Tuesday, October 17, 7:05 pm

Yesterday I mentioned in class that I was tentatively thinking of having one or two problems on the 'maxburst' example in tomorrow's quiz. Well, upon further thought, I decided against it. A number of people did poorly on Quiz 1 (sorry, still not finished grading yet), so I wanted to make sure people do well on the second one.

Tuesday, October 17, 11:35 am

IMPORTANT! Make sure Rscript works on your laptop. It comes with R, so it should be OK, but check it: Create a file test.R with simple content that prints, e.g.

print('test OK')

and then from a terminal window, type

Rscript test.R
Monday, October 16, 10:25 pm

Note that in the R parallel package, clusterExport() assumes that the items to be exported are in the global space where the call is made:

> library(parallel)
> cls <- makeCluster(2)
> f <- function(cls,x) clusterExport(cls,'x')
> f(cls,z)
Error in get(name, envir = envir) : object 'x' not found
> f <- function(cls,x) clusterExport(cls,'x',envir=environment())
> f(cls,z)

Here clusterExport() looked for a variable x in the environment in which that function was called.

Monday, October 16, 9:35 pm

Note that in an R parallel cluster, one can programmatically determine the number of nodes in the cluster by using the length() function, e.g.

> library(parallel)
> c2 <- makeCluster(2)
> length(c2)
[1] 2
Monday, October 16, 9:00 pm

Sorry I have not graded the quizzes yet. I hope to do that tomorrow. You will be notified of quiz result by e-mail.

Sunday, October 15, 11:20 pm

In order to ensure that everyone does well on Quiz 2, I will make that quiz short and easy. But please note that you will need the latest version of OMSI, at least Version 1.1.2, both to use the new CopyQtoA feature, and to deal with problems with the Submit feature that some of you encountered last week.

Please make SURE to install this before Wednesday,

Saturday, October 14, 11:50 pm

In order to be consistent with the TA's test, use row-major storage in the C portion of the assignment.

Saturday, October 14, 8:15 pm

On a similar topic, the specs say

Note that the data.table package by the amazing Matt Dowle is much faster for file reading, merging of two data objects etc. than the corresponding operations in base R.

In other words, you ARE allowed to use the data.table package, which by the way I have installed in my CSIF R package directory, ~matloff/Pub/Rlib.

Saturday, October 14, 8:05 pm

The problem specs for our homework state,

The TA's speed test will consist of the "twitter combined" file at the SNAP datasets site. He will decide on some CSIF machine on which to run the tests. Note that the timings are only for the times taken by the functions, NOT the time needed to read in the data from disk.

Sachin tells me that one group is using a feature of GCC 7 that is not available on GCC 5, the latter being what is on CSIF. As you can see above, this group will lose the speed competition, because their program won't even compile. Surely this is not what they had in mind. :-(

Furthermore, by implication, even Sachin's non-speed test, i.e. his test just to see that their code produces the right answers, will also be done on CSIF.

Just to make it explicit: All code must run on CSIF (and of course compile, in the case of C/C++).

Saturday, October 14, 1:40 pm

Following up on the "copy on write" discussion, the following little "When in doubt, try it out!" example should be illuminating:

> x <- rep(8,10000000)
> tracemem(x)
[1] "<0x107862000>"
> y <- x
> tracemem(y)
[1] "<0x107862000>"
> y[3] <- 168
tracemem[0x107862000 -> 0x10c4ae000]: 

Here, y had to be reallocated memory at a new address! This takes time, and thus even an innocuous little assignment statement can sap performance.

Saturday, October 14, 8:10 am

As mentioned in class yesterday, I have added a new menu option to OMSI, CopyQtoA. This will play a major role in all our remaining quizzes, so please download the new version.

The CopyQtoA operation copies the contents of the question box to the answer box. You will use it as follows. In many quiz problems I will give you a partial program, on which you will fill in the gaps. The program will be displayed in the question box. When you click CopyQtoA, that partial code will then be copied to the answer box, so that you need not type it in yourself. This will be a major issue, saving you a lot of precious quiz time.

I will continue to tweak OMSI, so I have added a VERSION file to the OMSI package. In order to use CopyQtoA, you will need at least the current version, 1.1.0.

Saturday, October 14, 7:40 am

Important message for those who used a virtual machine on our quiz:

I am told that at least one person could not turn in his quiz on a USB key because he didn't know how to transfer files from the virtual machine to the key. You can do this using scp or winscp, but since there are so many different virtual machine frameworks, Sachin and I cannot post general recipes for this.

The FAR BETTER solution is to put the necessary software on your machine (Python, R, gcc), so that you don't have to deal with virtual machines.

It will be your responsibility to solve this problem by next week's quiz.

Friday, October 13, 1:40 pm

In my blog post Wed., 11:30 pm, I mentioned a revision of the homework, consisting of clarifications but no changes. Actually, there was a "change," in that I moved the due date back a day. :-) I think some people missed that, so I am pointing it out now.

Also, an alert student pointed out to me after class today that the edges argument in the C version of recippar() should have just one asterisk. Note that that then means it is up to you whether you use row- or column-major storage, since it is "private" to you, without the compiler getting involved.

Thursday, October 12, 8:00 pm

Our class syllabus explains two important points about the .tar file you submit for your homework: (a) It really matters that you use the proper e-mail address; failure to do so may result in that person on your team not getting credit. (b) The proper e-mail address is your official UCD address, NOT your Kerberos/UAPP address. For most people, the two are the same but for some they differ. Your official UCD address is the one that Sachin has been sending mail to, and to which I sent early in the quarter before we started the blog.

Wednesday, October 11, 11:30 pm

When I assigned Homework I the other day, I stated that there would be a real data set to be named later. I have now named the data set, and have made a few minor clarifications (not changes), including regarding the speed test for Extra Credit.

Wednesday, October 11, 9:30 pm

Mystery solved! Almost as soon as I left Olson Hall this evening, I realized the problem: In the compile command, I forgot to include the -fopenmp option. :-)

My bad! I thought I had tested it, and Sachin thought he had too, but apparently not. Very sorry about that.

As compensation, I will make a major modification to the policy on course grades: Instead of throwing out your two lowest quiz grades, I will discard the lowest three.

In other news, what about the router issues?

Thanks in advance for your patience.

Tuesday, October 10, 11:10 pm

Our current programming assignment includes coding in the R parallel package, which can be challenging to debug. Please see hints in my Web page on R debugging, which I am in the process of writing.

Tuesday, October 10, 9:30 pm

As I stated in class yesterday, some machines seem to have a copy-and-paste problem in OMSI (again, Macs seem to be the ones), so the one problem in tomorrow's quiz requiring composing and running a full program will be quite short, and embarrassingly easy. It will be more challenging next week.

Tuesday, October 10, 1:15 pm Recall that in our quizzes, some questions will be ``essay''-style, requiring a non-program code answer. Don't worry about formating in your OMSI submssions of such questions. Monday, October 9, 1:35 pm

Our course syllabus is now on the Web! I know it is lengthy, but it is required reading. There is a lot of crucial information there, e.g. on grading, assignment policies and so on. Note that there are clickable links in the table of contents.

Sunday, October 8, 9:55 pm

At various points in our textbook, an R example will rely on some R package that is not in base R. In Chapter 1, for instance, there are the Rdsm and zoo packages. Each time some external package like this is used in an example, download it to your laptop for use in quizzes.

You'll need to decide on a directory in which to store your downloaded packages. I use ~/R on all my machines. Accordingly, I have a line like


in my ~/.Rprofile startup file.

Sunday, October 8, 9:20 am

Here is an important point about the OpenMP test example we gave you a few days ago.

That code uses the OpenMP private construct, which is another way of declaring variables that are local to each thread. It is essentially the same as

#pragma omp parallel 
{  int tid, nthreads;
Saturday, October 7, 10:55 pm

All students, whether using Macs or not, do need the OMSI version I just uploaded. It is the same as in my 9:15 posting, but I've added a version number to OMSI, in the file VERSION, currently 1.0.0. If you have such a file in your download, you are all set to go.

Saturday, October 7, 9:15 pm

For those using Mac laptops:

In my instructions, I consciously avoided asking you to alias gcc, directing you instead to put the OpenMP-capable gcc at the front of your search path. But then I somehow asked you to do an alias anyway. :-( So here is the fix:

  1. Make sure you are using the latest version of OMSI. Open, and search for the comment line,

    #selecting compiler

    If you have the right version, the next 5 lines will already be commented out with '''.

  2. Go to the directory containing your OpenMP-capable compiler, and set a symbolic link, e.g.

    % ln -s gcc-7 gcc
  3. Assuming you are using the bash shell, set the path environment variable in ~/.login; for me that is

    export /usr/local/Cellar/gcc/7.2.0/bin:$PATH

    You'll need to start a new shell window to use it.

Sorry for the error.

Saturday, October 7, 8:50 pm

A couple of people have reported problems implementing our instructions for gcc on Macs. I'm working it on now. Watch this space.

Saturday, October 7, 5:25 pm

Homework (i.e. Programming Assignment) I is now ready our Web site.

START EARLY! It took me quite a while to get even the serial version of Problem1.R working right.

Saturday, October 7, 1:15 pm

Our textbook has three appendices, on miscellaneous systems issue, matrix algebra and R. Please note that these are official course material, and are thus eligible for coverage in quizzes.

Friday, October 6, 10:45 pm

In our quizzes (starting next Wednesday), OMSI will download the quiz questions to your laptop. In addition, though, Sachin will hand out paper copies of the questions. This has two purposes:

Friday, October 6, 1:40 pm

Recall that in Sachin's guide to installing gcc on a Mac, he reminds you to add its directory to your search path. Note carefully that he does so at the FRONT of the path, so that the downloaded gcc will be executed by OMSI, not the alias of gcc to clang.

Thursday, October 5, 9:20 pm

As mentioned, you do not need to install MPI on your laptop for the quizzes. Any MPI questions will be "pencil and paper" only, treated the same by OMSI as an essay question. Note that this means that even if you do have MPI installed, you will not be able to compile and run your code.

Thursday, October 5, 9:20 pm

Sachin's office hours will be Thursdays, 5-7 pm and Fridays 9-10 a.m., both in Kemper 55.

Thursday, October 5, 8:25 pm

Rules for quizzes:

Thursday, October 5, 8:20 pm

Note that all of the old 158 quizzes and their solutions are on our course Web site

Wednesday, October 4, 9:30 pm

(Please note that material in the blog, e.g. the content of this posting, is to be considered part of the course materials, and thus eligible for coverage in quiz problems.)

Let's discuss further what is occurring "under the hood" with

#pragma omp parallel

in the bsort() function discussed in class today.

Consider the state of affairs when we enter the function. It is called from main(), so we are still in that thread. Thus the stack pointer hardware register will be pointing to that thread's stack. Since bdries and counts are locals, they will be stored on that stack, with it expanding by 2 words to accommodate them. At line 43, though, the compiler has inserted code to start the threads, and each of them will have its own stack. Line 46, for instance, will expand the stack for that thread by 1 word for the me variable.

Access to locals is typically implemented at the machine language level by expressions relative to the stack pointer, such as something like

movl 3,8(esp)

to write the value 3 to the location one word deep in the stack. (On a 64-bit machine, words are 8 bytes apart). So the machine language generated by the compiler to variables such as me, nth and so on will look something like this.

But what happens when a thread writes to, say, counts, as in line 67? The machine language must use the value of the main thread's stack pointer. So, part of the execution of line 43 will be to save that value in some other register or memory location, so that code within the parallel block can access counts and bdries.

Note that you can determine where these stacks are in memory. If after line 46, for example, you insert the code


you will get the location of the current top-of-stack for that thread.

Wednesday, October 4, 3:15 pm

I mentioned in class today that in lower-division courses our department has a culture in which students are encouraged to treat office hours (TA, instructor) as a first resort, rather than a last resort as they should be. Seeking help too early helps in the short run but is quite harmful in the long run, depriving the student of the hard, deep thinking needed for him/her to grow.

However, please don't conclude that this means you are unwelcome in office hours. Not true at all. Just make sure that you have made a good, thorough try to resolve your problem before seeking help.

Wednesday, October 4, 1:30 pm

Another good thing about debugging tools is that not only can you use them to debug code, you can also use it to understand code. I recommend stepping through that first OMP example using a debugger.

Wednesday, October 4, 8:55 am

It has come to my attention that at least one person in the class has much less background in computer systems -- e.g. knowledge of PATH environment variables -- than what we would normally see in an advanced programming class like ECS 158. This is not their fault, of course, but rather the fault of the catalog error that stated ECS 150 is only recommended, rather than a hard prerequisite.

For that reason, we will not require that you have MPI on your laptops for quizzes. You will only need an OpenMP-capable gcc, R and Python. Other tools, such as pthreads and MPI, will still be on quizzes, but in "pencil and paper" style, not compile/run.

Tuesday, October 3, 11:35 pm

As mentioned, one alternative to configure your laptop for our quizzes would be a virtual machine, running Linux inside. Jim Moersfelder of CSIF has set up one for you, here.

Jim has set things up to use VMware, for which we have a site license. However, I assume that any virtual machine software should work.

One way or the other, though, note carefully that you will be using your laptop in the first quize, October 11. You need to make sure you have your machine configure properly well before that day.

Tuesday, October 3, 11:25 pm

I believe I mentioned in class that our first quiz will be next week. Tomorrow, Sachin will set up groups, and demonstrate OMSI.

Monday, October 2, 10:55 pm

Please note that all the .tex files for our textbook are in If you want to copy code from the book, just download the proper .tex file and edit it. By the way, Google has site search, e.g.

Monday, October 2, 10:50 pm

I just installed gcc on my Mac laptop. It tool 85 minutes, and in case you have already read Sachin's instructions, I have updated them a bit.

Monday, October 2, 9:00 pm

Our TA, Sachin Kumawat, has written a really excellent guide (I have also added some information) to using the tools in our course, e.g. mpicc for MPI. Keep in mind that you will need certain tools for your laptop during quizzes, for our OMSI system. In addition, you will probably want these tools on your laptop for programming assignments. I strongly recommend getting them ready now, while you have time.

Monday, October 2, 10:25 am

My office hours will be 1:30-3:30, Wednesdays.

Monday, October 2, 12:05 am

Please note that although the lectures are out of the book, you are responsible for all material that comes up in lecture. As I mentioned in my Sept. 23 message,

Note: In almost all lectures, supplementary material will be added on the spur of the moment as new examples come to me and students ask questions. This material will often show up on exams, so it is definitely worthwhile to come to class, contrary to what you might think from the lecture format. :-)
Saturday, September 30, 3:15 pm

Enrollment in the course seems to have stabilized, so it is time to start forming your work groups.

As mentioned, if you know people in the class, you may wish to form groups from them. Remember, group size is three or four, so if you have just one person you wish to partner with, you'll need at least one more, or be assigned one or two more by the TA (see below).

Sachin, our TA, will formally set up groups during the discussion section next Wednesday. You will report your self-made groups to him, and he will assign "unaffiliated" students to groups.

I expect to put our first programming assignment on the Web tomorrow or Monday. It will be announced here, as with everything else.

Wednesday, September 27, 1:30 pm

As we all know, ECS 158 is a software course. But as you saw in the mini-lecture I gave today on OS processes, the course involves a lot more than just coding.

Please note that it will often occur during the quarter in lecture that I will, on the spur of the moment, talk about general software issues. They will relate to the course because the course is on software, so they are highly relevant. Please pay close attention to them.

Please also note that you are responsible for all issues presented/discussed in lecture.

Wednesday, September 27, 1:27 pm

For those who did not receive my pre-quarter e-mail messages, I have placed them in the MsgsPreQtrFall2017 directory in our course Web site.