

\documentstyle[twocolumn]{article}

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{-0.3in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.8in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.1in}
\setlength{\columnseprule}{0.4pt}

\begin{document}

\bf
\hfill Name:  $\underline{ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }$
% \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\rm
\bigskip
                                      
{\bf Directions:} Work on this sheet (both sides, if needed) only;
{\bf do not turn in any supplementary sheets of paper}.  
In order to get full credit, {\bf SHOW YOUR WORK.}

{\bf 1.}  Fill in the blanks, concerning RISC I and II:

\begin{itemize}

\item [(a)] (10) Answer using a number:  A branch in these
machines generates \_\_\_\_\_\_\_\_\_\_\_ delay slot(s).

\item [(b)]  (10) Suppose main() calls x(), x() calls y(), and y() calls
z().  Then during the execution of y() the virtual register r12 will be physical register
R\_\_\_\_\_\_\_\_\_.

\item [(c)]  (10) RISC II was able to accommodate a larger number of
registers than RISC I because RISC II had fewer \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
than RISC I.

\item [(d)]  (10) In ordinary machines with cache and virtual memory, there
are two kinds of ``catastrophes'' in which a failure to access fast
storage forces us to access slower storage:  a cache miss and a page
fault.  For RISC I and II, there is a third kind of catastrophe of
this nature.  What is it?

\end{itemize}

{\bf 2.}  (5) Cite a specific place in Patterson and Hennessy where the
authors' description of the virtues of the MIPS architecture is similar
to our definition of RISC from my lecture.

{\bf 3.}  Answer the following questions about the SPARC.

\begin{itemize}

\item [(a)]  (5) What are the names of the nonoverlapping registers in
SPARC assembly-language notation?

\item [(b)]  (10) Suppose our C source code includes the line

\begin{verbatim}
z = 0x12345678;
\end{verbatim}

where z is a global variable which the compiler has assigned to the
register \%g6.  Give efficient assembly code which the compiler could
generate for this line of source code.

\end{itemize}

{\bf 4.}  Consider the following extension of the example code on pages
19 and 22:

\begin{verbatim}
0x200   add r0,r1,r26
0x204   add r0,r2,r27
0x208   call x,r28
0x20c   nop
0x210   
...
0x320   add r10,r11,r3
0x324   add r10,r10,r4
0x328   
\end{verbatim}

Suppose the numbers at the left are the addresses in memory for the
corresponding instructions, and that the body of the function x(a,b) is

\begin{verbatim}
   s = a + b;
   t = 2 * a;
\end{verbatim}

where s and t are global variables.  Location 0x320 is the start of x().
Let c denote the clock cycle during which the instruction in 0x200 is
executed, so that the one in 0x204 is executed during cycle c+1 and so
on.  (The term {\bf cycle} here means major cycle, not pipeline
subcycles.)

\begin{itemize}

\item [(a)]  (10) What instruction should go into location 0x328?

\item [(b)]  (5)  During which cycle will the {\bf nop} be executed?

\item [(c)] (10) Assume RISC II.  During the execution of the {\bf nop}
instruction in 0x20c, will any of the programmer-visible registers
be read or written to?  If not, state why not.  If so, state which
registers(s), and the specific numerical values read or written.

\item [(d)] (10) Assume we are using RISC II (the version actually built
by Katevenis, not the alternative schemes proposed in Section 3.8.3 of
our pipelining handout).  Is it possible to rearrange the code so that
we eliminate the {\bf nop} and save a cycle?  If not, state why not; if
so, show how.

\end{itemize}

{\bf 5.}  (5) In the discussion associated with Fig. 6.4 of Patterson 
and Hennessy, the authors note that even with special hardware devoted
to the branch-stall problem, the {\bf lw} instruction's fetch would
still be performed 4 ns after that of the {\bf beq}.  What value would
that 4-ns figure change to without special hardware?  

{\bf Solutions:}

1.a.  1.

1.b.  44.

1.c.  Internal buses.

1.d.  Register window overflow/underflow.

2.  The subsection titled ``Designing Instruction Sets for
Pipelining,'' in Sec. 6.1.

3.a.  \%l0-\%l7.

3.b.  The constant 0x12345678 is more than 22 bits long, and thus cannot
be specified within a single instruction.  Thus we must use the {\bf
sethi} instruction:

\begin{verbatim}
sethi %g6,0x48d1   ! 0x48d1 = upper 22 bits of 0x12345678 
add %g6,0x2678,%g6 ! add the lower 10 bits 
\end{verbatim}

4.a.  ret r12.

4.b.  c+3.

4.c.  r28 will be written to.

4.d.  For example, the following would work:

\begin{verbatim}
0x200   
0x204   add r0,r2,r27
0x208   call x,r28
0x20c   add r0,r1,r26
0x210   
...
0x320   add r10,r11,r3
0x324   add r10,r10,r4
0x328   
\end{verbatim}

(With such a rearrangement, the addresses would probably change too,
but for simplicity let us suppose not.)  Note that we are taking
advantage of internal forwarding here.

5.  The time would now be increased from 4 ns to 6 ns, because we
do not know whether to fetch the {\bf lw} instruction (as opposed to
the instruction at address 40) until after the ALU stage of the
{\bf beq} instruction.
  
\end{document}



