Homework 3 Due Friday, May 31 I. Using Chipmunk, implement the NUMA bus system depicted in the handout on parallel processing. For simplicity, we will have just two PE's, i.e. just two P/M/R nodes, numbered 0 and 1. The system bus will have to enforce priority: If both R's make a request, serve PE 0 first, then PE 1. The memory space will consist of 16K 16-bit words, arranged in high-order interleaved manner. Use four SRAM8K components for this. In place of real processors, just use hex keypads and hex displays to simulate MAR's and MBR's. Your buses will be ordinary wires. Be sure to use tri-state devices. Use separate wires for Read and Write signals. You will have wires between each R and one set of logic which essentially does MUX/demux operations; the logic connects to the system bus. II. Write a parallel-processing program which counts and reports the NUMBER of prime numbers between 2 and r, inclusive (not the primes themselves). The program should be runnable on the ACS Mexican food machines (which is where the Reader will run them), using NXLib. PE 0 will serve as the "manager," but as with the "hot potato" example program, all PE's -- including PE 0 -- will run the same program. PE 0 will be the one to do the initialization, AND THE FINAL OUTPUT OF THE NUMBER OF PRIMES. The manager reads command-line arguments r and n, where n is the number of PE's. The Reader will run your program for various values of r and n. He will check not only the correct operation of your program, but also the run time. The fastest program (or if several programs are very close in speed) will get Extra Credit; this credit will be recorded with your exam scores and could play a significant role in your grade. Of course, you need not try for Extra Credit if you are not interested. You will get full credit as long as your program provides the correct answers, and the program does have more than one PE doing work. When you turn in your program, include a README file which *clearly* explains how the manager parcels out work to the other PE's. (You can put this in the comments within your source file if you prefer, but if so, include a README file which points to the comments.) Note again that the Reader will try your code on a wide range of values of r (up to the largest unsigned int). Note that although you must have more than one PE doing work, you will may find that the optimal work is not necessarily balanced. The Reader will measure elapsed time (third field in the output of the `time' command). MAKE SURE THAT YOU USE THE ACS MACHINES IN A SOCIALLY RESPONSIBLE MANNER. MAKE SURE THAT YOU CHECK FOR "ZOMBIE" nxdaemon PROCESSES ON ALL MACHINES IN YOUR PARTITION BEFORE YOU LOG OUT. And try your best to finish BEFORE the due date; if many people are working on this problem on the due date, machine response will be extremely slow, and your timings will be quite inaccurate. You will probably find that it is better to use fewer processors if r is smaller, and you may wish to incorporate some kind of consideration like this. In particular, just loading your program at all those nodes will take a lot of time, so you may wish to write a shell-script front end to your code, as follows. Set up many different partition files, say pn2, pn5, pn10, etc., with pni specifying i nodes. The front end shell script will look at the value of r, and then start up your program with a partition file with the best number of nodes for this value of r. (If you do this, include your partition files along with your other files when you submit the homework.)