Homework 4
Due Tuesday, Nov. 27
Here you will write an assembly language subroutine, callable from C,
that calls the pthreads library. It will implement a certain sorting
algorithm. Here are the specs.
-
The C signature of the function will be
void ucdsort(int *x, int n, int nth)
where we sort an array x of length n,
using nth threads.
-
The parameter nth refers to the number of threads
created by pthread_create(), not the caller's thread or
the thread consisting of ucdsort() itself.
-
The first stage consists of finding the minimum and maximum values in
x. This is accomplished by breaking x
into roughly equal-sized chunks, then having each thread find the
minimum and maximum values in its chunk of the array, and finally
combining the results to find the overall minimum and maximum values.
-
In the second stage, we break the range mn through
mx into nth roughly equal-sized chunks.
Each thread will then cull out from the array all elements of
x that are in that thread's range.
After determining the number of such elements, a thread publicizes
that information in a shared variable, which will inform the threads
as to where in x they should be writing the results
of their individual sort operations.
Each thread then sorts the values of x in its chunk,
by calling qsort() in the C library, and copies the
results to the proper portion of x.
-
At the end, x is sorted, i.e. the old contents are
destroyed.
-
In forming the chunks, there is no specific requirement as to what
"roughly equal size" means.
-
This algorithm will use a lot of memory. Don't worry about that.
-
This subroutine, meaning its .s file, must be
self-contained. The C caller should not get involved in the threads
business at all, other than specifying the number of threads.
Illustrative example: Say we have 3 threads, and our array is
(12,5,13,3,4,5,8,1).
-
In the first stage, threads 0, 1 and 2 work on (12,5,13), (3,4,5) and
(8,1), respectively. That produces min/max values of (5,13), (3,5)
and (1,8), which in turn yields mn = 1 and
mx = 13.
-
We then break (1,2,...,13) into (1,2,3,4), (5,6,7,8) and (9,10,11,12,13),
to be handled by threads 0, 1 and 2. For example, thread 0 will sort
all elements of x that have the value 1, 2, 3 or 4,
yielding (1,3,4).
Threads 0 and 1 publicize the fact that they will be working on 3 and
3 elements, respectively. (Thread 2 will be working on 2 elements, but
it need not publicize it.) Here is the key point: In the end, thread
0's 3 elements will occupy slots 0, 1 and 2 of the sorted
x; thread 1's 3 elements will occupt slots 3, 4 and 5
of x; and thread 2's 2 elements will occupy the rest.
Make sure you test your code on some fairly large arrays, and with more
than 2 threads. As mentioned in class, counting hyperthreading, you can
get 4 useful threads on our dual-core machines such as pc28, and 8 on
our machine named tetra.