Homework 4
Due Wednesday, June 3
Problem I:
In this problem you will use Chipmunk to implement age registers for LRU checking in a cache.
Set up four 4-bit counters, corresponding to age registers for a set of four lines. Inputs will be:
On each clock cycle (which we assume has already been AND-ed with a line choosing this set), every counter is incremented except for one which is reset to 0. The exceptional line is the one accessed in the case of a hit, and the one replaced in the case of a miss. Also, if upon incrementing, one of the registers wraps around to 0, all the registers are reset to 0.
As usual, remember that we are just implementing part of a system. The inputs here would be outputs from other parts of the system.
Problem II:
In this problem you will get a chance to explore how block size and other cache hardware parameters, and even the way we write our code will affect the speed of a program. You will use the DLX architecture simulator and the dinero cache simulator.
(Note: Parts of this problem require you to report some numbers and give explanations. Write these up in an ASCII text file.)
Part A:
Consider the program
int Z[1000]; main() { int I; for (I = 0; I < 1000; I += 4) Z[I] = I; }Compile and run the program, and then run dinero with a 16-byte block size, a 16K unified cache, and set-associativity of degree 4. Do two runs, one with write-through and the other with write-back policy, and compare the two amounts of memory traffic generate.
Explain why one worked better than the other on this particular program.
Part B:
Here you will be using the program Prime.c in the directory
~matloff/Pub/DLXon the ACS machines. (Note that this program differs slightly from the program of the same name on my DLX Web page.)
Compile and run the program, and then do several runs of dinero, noting the instruction and data cache miss rates and amount of memory traffic in each case:
Part C:
Again use the program Prime.c. Assume that it takes 15 cycles for each word read/written from/to memory, on top of the number of cycles reported by dlxsim for executing the program. Assume the cache parameters in B(i) above. Modify Prime.c so as to reduce its overall run time (measured in total number of cycles, i.e. the sum of the basic cycles reported by dlxsim and 15 times the number of memory words accessed as reported by dinero).
The six homework groups having the six fastest programs will get Extra Credit. (The Reader will record this and give me the results.)