Homework V

Due Tuesday, December 2

Note: There will be no extensions to the due date, since there needs to be time to do the grading.

Problem 1:

Re-do Homework I, using Max+PlusII. Feel free to use the same design you used in Homework I. Instead of using a hex keypad to input n, as in the Chipmunk implementation, just input n via your waveform file. Note: If you have any questions about Max+PlusII, address them to May, as she is the expert on this software.

Problem 2:

Part A:

Write a C or C++ program to simulate a direct-mapped cache. Its input will be a file of memory addresses which simulate address generation by a CPU. (If you prefer, have your program read from the stdin, and use redirection when you run the program.) The addresses will be expressed in hex, one per line of the file. The output will be the miss rate. Addresses are assumed to be 32 bits wide, and the machine is byte-addressable. The problem will have the following parameters, input via the command line (i.e. argv):

Note that the number of cache lines, l, is a function of b and c. In determining this function, assume that the nondata material in each line--tag, valid bit, etc.--occupies a total of 4 bytes. Also, assume that any combination of b and c given to the program will result in an integer value of l.

It is suggested that you have a struct type or class which implements a cache line, consisting of a valid bit and a tag. No actual space for the data is needed in our simulator, since we are only simulating the miss behavior of the cache. (But of course you still must account for the data space in your formula for l in terms of b and c above.) Similarly, no dirty bit is needed in our simulator, since our simulation won't distinguish between reads and writes.

It is also suggested that you make heavy use of C's &, |, << and >> operators, and implement the struct or class members by using the unsigned int type.

Test your program thoroughly. Here is a sample for you: Suppose the program is run with b = 32, c = 144. That will imply that l = 4. Suppose the input file is

32
4b
a
5a
b8

This should result in a miss rate of 0.8. Only the fourth of the five memory accesses will be a hit. By the way, after the fifth access, the contents of the cache will be

line    block
   0        0
   1        5
   2        2
   3        -

Now suppose the sixth access is to address 34 (i.e. 0x34). That will also be a miss. On the other hand, if the overall cache size were 288, then an address of 34 on the sixth access would result in a hit.

Part B:

Alter the function qs() in the file QS.c in the homework directory of our class Web page in order to generate memory addresses, as follows: Add print lines to the code

   for (i = 1; i < n; i++)  {
      xi = x[i];
      if (xi < pivot)
         p1[n1++] = xi;
      else p2[n2++] = xi;
   }

so that we print out the address of each array element that we access. For example, after the line

      xi = x[i];

add

      printf("%x\n",&x[i]);

Insert similar print statements after the accesses of p1 and p2.

All this will give you a record of all the array accesses. This is of course not the same as all memory accesses, since for example it doesn't account for instruction-fetch accesses, but it does pick up most of the data accesses, and that is good enough for our purposes.

Then use the output of this program as input to your program in Part A above. We are going to generate data similar to that presented related to p.719 in my Dandamudi supplement, showing the relationship between miss rate and block size. It is likely to be U-shaped, as in the data in the supplement. (You will just report your data in tabular form, as I did, rather than actually drawing a graph.)

Do runs for various values of b and c. For each fixed value of c, you will likely get the U-shaped relationship described above. On the other hand, the larger c is, the lower the U should be. Use a fairly large value of n, to get a good statistical sample.

Present your data and findings (e.g. did you really get a U shape?) in a plain-text ASCII file.

Problem 3:

Construct an example to show that the pseudo-LRU scheme in Fig. 17.15 of Dandamudi is not true LRU. (The figure is for a 4-way associative cache, but your example could use some other degree of associativity if you wish.) Your example should consist of a sequence of block numbers, and you should show that if that memory access pattern were to occur in a program running on a pseudo-LRU cache, you would not get the same replacements as you would with a true-LRU cache.

Again, present your example and your explanation in a plain-text ASCII file.