\documentclass[10pt]{article}

\setlength{\oddsidemargin}{-0.5in}
\setlength{\evensidemargin}{-0.5in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{7.0in}
\setlength{\textheight}{9.5in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.05in}
\setlength{\columnseprule}{0.3pt}
\usepackage{fancyvrb}
\usepackage{relsize}

\begin{document}

Name: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

\textbf{Directions: Work only on this sheet (on both sides, if needed); do not
turn in any supplementary sheets of paper. There is actually plenty of room
for your answers, as long as you organize yourself BEFORE starting writing.
In order to get full credit, SHOW YOUR WORK.}

% {\bf 1.} () Consider our prime-number finding example code for Pthreads
% (or some other similar program).  It is designed to run on a
% multiprocessor system in a parallel processing manner in the hope of
% finding primes faster than we would with a nonthreaded program on a
% uniprocessor system.  But it's possible that even on a uniprocessor
% machine, the threaded version may be faster (i.e. complete its work
% earlier) than a nonthreaded one.  Explain why.

{\bf 1.} (10) Fill in the blank:  The {\bf struct} field {\bf cycle} in
the complex barrier code presented in discussion section corresponds to
the field \_\_\_\_\_\_\_\_\_\_\_\_\_\_ in the barrier code in our PLN.

{\bf 2.} (15) In our PLN, the way we applied a lock operation on a lock
{\bf L} before entering a critical section {\bf C} was as follows:

\begin{Verbatim}[fontsize=\relsize{-2}]
TRY:  TAS R,L
      JNZ TRY
C:    ...
      MOV L,0
\end{Verbatim}

Consider this variant:

\begin{Verbatim}[fontsize=\relsize{-2}]
      TAS R,L
      JNZ DO_SOMETHING_ELSE
C:    ...
      MOV L,0
\end{Verbatim}

What Pthreads call is this variant analogous to?

{\bf 3.} Consider the discussion on read/write locks in Sec. 7.8.1 in Grama. 

(a) (5) By using read/write locks instead of ordinary locks, the mean
time a read must wait to execute will \_\_\_\_\_\_\_\_\_\_\_\_\_\_.
(Answer {\it increase} or {\it decrease}.)

(b) (5) Answer [(b)] for writes.

(c) (5) Suppose reads are currently in progress and suddenly some thread
requests a write.  Choose one:  (i) The write will pre-empt the reads.
(ii) The write will not pre-empt the reads, and will not run as long as
there are reads to be done.  (iii) The write and the reads will take turns
running, a quantum (or timeslice) at a time.  (iv) One can't predict
what will occur.

(d) (5) Suppose a write is currently in progress, and suddenly
some thread requests a read and some other thread requests a write.
Choose one:  (i) When the first write completes, the read will run.
(ii) When the first write completes, the second write will run.
(iii) When the first write completes, the read and the second write will
take turns running.  (iv) The read and the two writes will take turns
running.  (v) One can't predict what will occur.

{\bf 4.} (15) Consider the graph in the section of our PLN on
shortest-path computation.  Add one more vertex v5, which is a distance
2 away from each of v1 and v4.  Suppose we apply the MPI code in Sec.
6.6.9 of Grama to this problem, running on two nodes.  (So our {\bf
my.pg} file would consist of two entries.) Show the entire contents of
the (one-dimensional) array {\bf wgt} for the node having rank 1.

{\bf 5.} We are interested in finding the root of a strictly monotone
(strictly increasing or strictly decreasing) function {\bf g()}, in an
iterative manner, by inspecting narrower and narrower intervals.  The
initial interval to inspect is [{\bf left},{\bf right}], and we iterate
until the interval width is less than {\bf eps} (say 0.0001).  At each
iteration, the current interval is divided into equal subintervals, one
for each node.  Exactly one of the nodes will discover a sign change in
{\bf g()} across its subinterval.  That subinterval becomes the new
interval to be used in the next iteration.  You may wish to read part
(b) first, to get a more concrete idea of how the root finding works.
Your code is required to be reasonably efficient in terms of speed. 

(a) (20) Write a JIAJIA function to find the root:

\begin{Verbatim}[fontsize=\relsize{-2}]
float findroot(float (g)(float), float 
   left, float right, float eps, 
   int numnodes, int me, int *winner, int *workdone)
\end{Verbatim}

The arguments {\bf winner} and {\bf workdone} are the identity of the
node which found the subinterval in which the sign change occurred, and
a shared array which keeps track of work done.\footnote{The {\bf winner}
argument was not present in the original version of the exam, and
students directly accessed a global shared variable.} At the end of
execution, {\bf workdone[i]} will contain a count of the number of times
node {\bf i} was the winner.  Use only the shared-memory functions of
JIAJIA, not the ``MPI-like'' message-passing ones.

(b) (20) The MPI code for root finding shown below has a gap, missing
one statement.  Fill in the gap in the code with a single statement,
consisting of a single fully-specified MPI call.  The arguments {\bf
numnodes} and {\bf me} are the number of nodes and the node number of
this machine.  

\begin{Verbatim}[fontsize=\relsize{-2}]
float findroot(float (g)(float), float 
   left, float right, float eps, 
   int numnodes, int me, MPI_Comm comm)
{  float a=left,b=right,wid,t1,t2;
   int winner,y[2],z[2];
   MPI_Status status;  // see below 

   while (1)  {
      wid = (b-a)/numnodes;
      t1 = g(a+me*wid); t2 = g(a+(me+1)*wid);
      y[0] = floor(t1*t2); y[1] = me;
      // gap:  place a single MPI call here
      winner = z[1];
      a = a+winner*wid;  b = a+(winner+1)*wid;
      if (b-a < eps) return (a+b)/2.0;
   }
}
\end{Verbatim}

{\bf Solutions:}

{\bf 1.}  EvenOdd

{\bf 2.}  {\bf pthread\_mutex\_trylock()}

{\bf 3.}  decrease; increase; (ii); (ii)

{\bf 4.}  The array stores the last threee columns of the one-hop
distance matrix, in row-major order.  So its contents are:
$(\infty,\infty,\infty,10,5,2,100,\infty,\infty,0,\infty,\infty,\infty,0,2,\infty,2,0)$

{\bf 5.a}

\begin{verbatim}
...
wid = (b-a)/numnodes;
t1 = g(a+me*wid); t2 = g(a+(me+1)*wid);
if (t1*t2 < 0) {
   *winner = me;
   workdone[me]++;
}
jia_barrier();
a = a+*winner*wid;  b = a+(*winner+1)*wid;
if (b-a < eps) return (a+b)/2.0;
\end{verbatim}

{\bf 5.b}

\begin{verbatim}
status = MPI_Allreduce(y,z,1,MPI_2INT,MPI_MINLOC,comm);
\end{verbatim}

\end{document}


