Homework 4

Due Tuesday, Nov. 27

Here you will write an assembly language subroutine, callable from C, that calls the pthreads library. It will implement a certain sorting algorithm. Here are the specs.

The C signature of the function will be
```
void ucdsort(int *x, int n, int nth)
```
where we sort an array x of length n, using nth threads.

The parameter nth refers to the number of threads created by pthread_create(), not the caller's thread or the thread consisting of ucdsort() itself.

The first stage consists of finding the minimum and maximum values in x. This is accomplished by breaking x into roughly equal-sized chunks, then having each thread find the minimum and maximum values in its chunk of the array, and finally combining the results to find the overall minimum and maximum values.

In the second stage, we break the range mn through mx into nth roughly equal-sized chunks. Each thread will then cull out from the array all elements of x that are in that thread's range.

After determining the number of such elements, a thread publicizes that information in a shared variable, which will inform the threads as to where in x they should be writing the results of their individual sort operations.

Each thread then sorts the values of x in its chunk, by calling qsort() in the C library, and copies the results to the proper portion of x.

At the end, x is sorted, i.e. the old contents are destroyed.

In forming the chunks, there is no specific requirement as to what "roughly equal size" means.

This algorithm will use a lot of memory. Don't worry about that.

This subroutine, meaning its .s file, must be self-contained. The C caller should not get involved in the threads business at all, other than specifying the number of threads.

Illustrative example: Say we have 3 threads, and our array is (12,5,13,3,4,5,8,1).

In the first stage, threads 0, 1 and 2 work on (12,5,13), (3,4,5) and (8,1), respectively. That produces min/max values of (5,13), (3,5) and (1,8), which in turn yields mn = 1 and mx = 13.

We then break (1,2,...,13) into (1,2,3,4), (5,6,7,8) and (9,10,11,12,13), to be handled by threads 0, 1 and 2. For example, thread 0 will sort all elements of x that have the value 1, 2, 3 or 4, yielding (1,3,4).

Threads 0 and 1 publicize the fact that they will be working on 3 and 3 elements, respectively. (Thread 2 will be working on 2 elements, but it need not publicize it.) Here is the key point: In the end, thread 0's 3 elements will occupy slots 0, 1 and 2 of the sorted x; thread 1's 3 elements will occupt slots 3, 4 and 5 of x; and thread 2's 2 elements will occupy the rest.

Make sure you test your code on some fairly large arrays, and with more than 2 threads. As mentioned in class, counting hyperthreading, you can get 4 useful threads on our dual-core machines such as pc28, and 8 on our machine named tetra.