Homework 4

Due Wednesday, March 10

In this homework you will write a multiprocessor program to do Quicksort, and then perform some experiments with it.

Part A:

Write a Quicksort program to be run on the MulSim shared-memory multiprocessor simulator. I have placed executables for the simulator on the "Mexican food" machines in ACS (and may later do so on the SGI machines in CSIF), or you can build it yourself on your Linux box. Comply fully with the requirements below, but otherwise your design is up to you.

In addition to the command-line arguments which the simulator uses (program name, number of CPUs, interconnect type), you must have two command-line arguments for your application: The first will be N, the number of elements to be sorted, and the second will be the initial value of the seed for the random-number generation.

Each time a CPU finishes a task, it goes to the task queue to fetch another one to work on. Each task is specified in a struct of type QElt. QElt and the head and tail of the task queue are declared in the global declaration

struct QElt  {
   int Low,High;
   struct QElt *Next;
} *QHead,*QTail;

If T is a task, then the task consists of sorting X[T.Low] through X[T.High], in place. (By the latter term I mean that the sorting of this portion of X will not affect the rest of X.)

Your program must have the following functions:

Each CPU will repeatedly: call GetPair() take a task from the queue; call Separate(); and add the tasks (L,M-1) and (M+1,H) to the queue; until DONE holds.

Make sure your program is written in a sensibly modular way. For example, consider code like

LOCK(whatever);
GetPair(whatever);
UNLOCK(whatever);  

This is not modular! The calls to LOCK() and UNLOCK() should be within GetPair(). GetPair() should be self-contained. By the same token, GetPair() should not call Separate(), AddPair() etc.; GetPair() should only do what its name implies---get a task pair, nothing more.

This is important from the point of view of good software engineering, but it is also important for program speed. We should apply a lock at the latest possible time, and unlock at the earliest possible time, in order to have the shortest possible window of time during which a lock is active. The reason for this is that when a lock is in use by one processor, no other processor can access the given data, so they are just wasting time, waiting, and the machine is thus slower.

Test your program thoroughly over a range of values of N (up to 1000) and the number of CPUs (up to 32). The Reader will do the same.

Part B:

For fixed values of the command-line arguments N and Seed, and a fixed type of interconnect, the graph of the run time (in cycles) of the program against the number of CPUs should be roughly "U-shaped." (Make sure you understand why.) Illustrate this with your choice N, Seed and some range of the number of CPUs, using a bus interconnect.

You may find that instead of a U shape, your graph is steadily rising. If so, this probably means that your code is somehow preventing parallelism. Add some code to check this. For instance, see how many calls to GetPair() each CPU makes; you may find that all the work is being done by just one CPU, due to some problem in your code. If this confirms that just one CPU is doing all the work, I highly recommend that you use the debugger to single-step through the execution of your program and observe the calls to GetPair(); this should reveal why one CPU is always getting work and the others aren't.

Note also that you should not use a very small value of N, since that would give you statistical problems (small "sample size"). Use a value of N which is at least 100, maybe more.

Then using the same N, Seed and range of the number of CPUs, re-run the simulation using a snoopy-update scheme. This graph should also be roughly U-shaped, and should (at least mostly) be below the first graph.

A Note on Debugging:

Many students do not use debugging tools. This is a real shame, because they could save large amounts of time if they used those tools. I feel so strongly about this that I have a Web page on it, at http://heather.cs.ucdavis.edu/~matloff/debug.html Professional programmers make heavy use of debuggers, and indeed a common question asked during an interview is, "Tell me how you go about debugging a program."

Use of debugging tools is especially important when debugging parallel programs. You need to see how the different CPUs interact with each other, and the only way to do that well is to use a debugger's single-step facility. Granted, the debugger in MulSim is less convenient, since it operates at the assembly-language level, but it can really be of tremendous help.