Homework 4
Due Wednesday, March 10
In this homework you will write a multiprocessor program to do Quicksort, and then perform some experiments with it.
Part A:
Write a Quicksort program to be run on the MulSim shared-memory multiprocessor simulator. I have placed executables for the simulator on the "Mexican food" machines in ACS (and may later do so on the SGI machines in CSIF), or you can build it yourself on your Linux box. Comply fully with the requirements below, but otherwise your design is up to you.
In addition to the command-line arguments which the simulator uses (program name, number of CPUs, interconnect type), you must have two command-line arguments for your application: The first will be N, the number of elements to be sorted, and the second will be the initial value of the seed for the random-number generation.
Each time a CPU finishes a task, it goes to the task queue to fetch another one to work on. Each task is specified in a struct of type QElt. QElt and the head and tail of the task queue are declared in the global declaration
struct QElt { int Low,High; struct QElt *Next; } *QHead,*QTail;
If T is a task, then the task consists of sorting X[T.Low] through X[T.High], in place. (By the latter term I mean that the sorting of this portion of X will not affect the rest of X.)
Your program must have the following functions:
Init()
Called only by CPU 0. Picks up the application-specific command-line arguments and assigns them to globals N and Seed. Initializes X[0] through X[N-1], the array to be sorted (X global), with random integers, using RandomPackage. Initializes the task-queue, to consist of one pair (0,N-1).
AddPair(A,B) int A,B;
Adds the pair (A,B) to the task queue, i.e. a task specifying that X[A] through X[B] must be sorted.
int GetPair(PL,PH) int *PL,*PH;
Removes the head of the task queue, with *PL and *PH then telling us what portion of X we need to sort. (But GetPair() will not do the actual sorting.) Returns #define values DONE ( = 1) or NOT_DONE ( = 0), DONE meaning that the entire array has already been sorted. (Note that this is a far stronger condition than merely saying that the task queue is currently empty.)
int Separate(L,H) int L,H; { int Ref,I,J,K,Tmp; Ref = X[H]; I = L-1; J = H; do { do I++; while (X[I] < Ref && I < H); do J--; while (X[J] > Ref && J > L); Tmp = X[I]; X[I] = X[J]; X[J] = Tmp; } while (J > I); X[J] = X[I]; X[I] = X[H]; X[H] = Tmp; return I; }
Here is where the main work is done. This function scans the subarry X[L] through X[H], rearranges those elements, and returns an index M such that:
X[L] through X[M-1] are also in their "final resting places" as a group: they may be exchanging places among themselves in the remaining steps of the sort, but none of them will move outside the subscript range L to M-1; a similar statement holds for X[M+1] through X[H]
Each CPU will repeatedly: call GetPair() take a task from the queue; call Separate(); and add the tasks (L,M-1) and (M+1,H) to the queue; until DONE holds.
Make sure your program is written in a sensibly modular way. For example, consider code like
LOCK(whatever); GetPair(whatever); UNLOCK(whatever);
This is not modular! The calls to LOCK() and UNLOCK() should be within GetPair(). GetPair() should be self-contained. By the same token, GetPair() should not call Separate(), AddPair() etc.; GetPair() should only do what its name implies---get a task pair, nothing more.
This is important from the point of view of good software engineering, but it is also important for program speed. We should apply a lock at the latest possible time, and unlock at the earliest possible time, in order to have the shortest possible window of time during which a lock is active. The reason for this is that when a lock is in use by one processor, no other processor can access the given data, so they are just wasting time, waiting, and the machine is thus slower.
Test your program thoroughly over a range of values of N (up to 1000) and the number of CPUs (up to 32). The Reader will do the same.
Part B:
For fixed values of the command-line arguments N and Seed, and a fixed type of interconnect, the graph of the run time (in cycles) of the program against the number of CPUs should be roughly "U-shaped." (Make sure you understand why.) Illustrate this with your choice N, Seed and some range of the number of CPUs, using a bus interconnect.
You may find that instead of a U shape, your graph is steadily rising. If so, this probably means that your code is somehow preventing parallelism. Add some code to check this. For instance, see how many calls to GetPair() each CPU makes; you may find that all the work is being done by just one CPU, due to some problem in your code. If this confirms that just one CPU is doing all the work, I highly recommend that you use the debugger to single-step through the execution of your program and observe the calls to GetPair(); this should reveal why one CPU is always getting work and the others aren't.
Note also that you should not use a very small value of N, since that would give you statistical problems (small "sample size"). Use a value of N which is at least 100, maybe more.
Then using the same N, Seed and range of the number of CPUs, re-run the simulation using a snoopy-update scheme. This graph should also be roughly U-shaped, and should (at least mostly) be below the first graph.
A Note on Debugging:
Many students do not use debugging tools. This is a real shame, because they could save large amounts of time if they used those tools. I feel so strongly about this that I have a Web page on it, at http://heather.cs.ucdavis.edu/~matloff/debug.html Professional programmers make heavy use of debuggers, and indeed a common question asked during an interview is, "Tell me how you go about debugging a program."
Use of debugging tools is especially important when debugging parallel programs. You need to see how the different CPUs interact with each other, and the only way to do that well is to use a debugger's single-step facility. Granted, the debugger in MulSim is less convenient, since it operates at the assembly-language level, but it can really be of tremendous help.