\chapter{Parallel Combinitorial Algorithms}

\section{Overview}

In Chapter \ref{chap:intro}, we saw Dijkstra's algorithm for finding the
shortest path in a graph.  In Chapter \ref{chap:matrix}, we saw an
algorithm for finding bridges within a graph.  Both of these are {\bf
combinatorial search algorithms}.  Such algorithms generally have
exponential time complexity, and thus are natural candidates for
parallel computation.  This chapter will present a few more examples.

\section{The 8 Queens Problem}

A famous example is the 8 Queens Problem, in which one wishes to place
eight queens on a standard 8x8 chessboard in such a way that no queen is
attacking any other. (The generalization, of course, would involve n
queens on an nxn board.) Suppose our goal is to find all possible
solutions.

To start a solution to this problem, we first note in any solution will
have the property that no row will contain more than one queen. This
suggests building up a solution row by row: Suppose we have successfully
placed queens so far in rows 0, 1, ..., k-1 (row 0 being the top row of
the board). Where can we place a queen in row k? Well, since we cannot
use any column already occupied by the preceding k queens, that means we
have a choice of 8-k columns. But even among those k columns, there will
be j of them, for some $0 \leq j \leq 8-k$ that are in the diagonal
attack path of some preceding queen. Then we can extend our tentative
k-row solution to 8-k-j new (k+1)-row solutions. 

We will define our solution here for the shared-memory paradigm, though
it would be easy to change this for the message-passing
paradigm.\footnote{The main point would be to change linked lists and
pointers to arrays and array indices.} Define

\begin{Verbatim}[fontsize=\relsize{-2}]
struct TentSoln {
   int RowsSoFar;
   int Cols[8];
   struct TentSoln *Next;
}
\end{Verbatim}

Each such {\bf struct} contains a partial solution, up through row
number {\bf RowsSoFar}.  The array {\bf Cols} has the interpretation
that {\bf Col{[}I{]} == J} tells us which column the queen in row {\bf
I} occupies.

Each {\bf struct} is a task showing one partial solution.  The node
which obtains this task will then extend this partial solution to
several new, longer partial solutions.

The tasks are all placed into a linked list.  {\bf Next} points to the
next item in the work pool.

A parallel solution based on this idea would like something like this:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
while (work pool nonempty or at least one nonidle processor) {
   get a TentSoln struct from the work pool, and point P to it;
   I = P->RowsSoFar;
   for (J = 0; J < 8; J++) {
      if (a queen at row I, column J would not attack the previous queens) {
         Q = malloc(sizeof(struct TentSoln));
         Q->RowsSoFar = I+1;
         add the struct pointed to by Q to the work pool;
      }
   }
}
\end{Verbatim}

There of course would also be code in the case I = 8 to check and see if we
have found a solution, and if so, to record it, etc.

Note that any rotation of a solution---interchanging rows and
columns---is also a solution.  Similarly, any reflection across one of
the two main diagonals of the board is also a solution.  This
information could be used to speed up computation, though at the
expense of additionality complexity of the code. 

\section{The 8-Square Puzzle Problem}

This game was invented more than 100 years ago. Here is what a typical board
position looks like:

\vspace{0.3cm}
{\centering \begin{tabular}{|c|c|c|}
\hline 
0&
5&
3\\
\hline 
1&
4&
\\
\hline 
7&
2&
6\\
\hline 
\end{tabular}\par}
\vspace{0.3cm}

(The real puzzle has numbering from 1 to 8, but we use 0 to 7.)

Each number is on a little movable square, which can be moved up, down,
left and right as long as the spot in the given direction is unoccupied.
In the example above, the square 3, for instance, could be moved
downward, producing an empty spot at the top right of the puzzle.  The
object of the game is arrange the squares in ascending numerical order,
with square 0 at the upper left of the puzzle (which in this example
happens to be the case already).

We again solve this by setting up a work pool, in this case a pool of board
positions.  Each board position would be implemented in something like
this:

\begin{Verbatim}[fontsize=\relsize{-2}]
struct BoardPos   {
   int Row[9];
   int Col[9];
   struct BoardPos *Next; 
}
\end{Verbatim}

Here {\bf Row[I]} and {\bf Col[I]} would be the position of the square
numbered {\bf I}.  For convenience, we also store the location of the
blank position, in {\bf Row[8]} and {\bf Col[8]}. 

Suppose a processor goes to the work pool and gets the board position
depicted above. In the simplest form of the algorithm, the processor
would check each of the three possible moves (4 right, 3 down, 6 up) to
see if the resulting board position would duplicate one that had already
been checked. All moves that lead to new positions would be added to the
work pool. Each processor would loop around, pulling items from the work
pool, until some processor somewhere finds a solution to the game (in
which case that processor would add termination messages to the work
pool, so that the other processors knew to stop).  An outline of the
algorithm would be as follows:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
while (work pool nonempty or at least one nonidle processor) {
   get a BoardPos struct from the work pool, and point P to it;
   for (I = 0; I < 8; I++)  {
      for all possible moves of square I do  {
         Q = malloc(sizeof(struct BoardPos));
         fill in *Q according to this move;
         if *Q has not already been checked
            add this board to the work pool;
      }
   }
}
\end{Verbatim}

Again, code would need to be included for checking to see if a solution
has been found, whether we have found that no solution exists, and so on.

Note the operation

\begin{Verbatim}[fontsize=\relsize{-2}]
         if *Q has not already been checked
            add this board to the work pool;
\end{Verbatim}

Clearly this is needed, to avoid endless cycling.  But it is not as
inoccuous as it looks.  If the set of all previously-checked board
positions is to be made available to all processors, this may produce
substantial increases in contention for memory and interprocessor
interconnects.  On the other hand, we could arrange the code such
that only certain processors have to know about certain subsets of
the set of previously-checked board positions, but this makes the
code more complex and may produce load-balancing problems.

A more sophisticated version of the algorithm would use a
\textbf{branch-and-bound} technique. The idea here is to reduce
computation by giving priority in the work pool to those board positions
which appear ``promising'' by some reasonable measure. For example, we
could take as our measure the ``distance'' between a given board
position and the goal board position, as defined by the sum of the
distances from each numbered square to its place in the winning
position.  In the example above, for instance, the square numbered 5 is
a distance of 2 from its ultimate place (2 meaning, one square to the
right, one square down, so 1+1 = 2). The board above is a distance 15
from the winning board.

The idea, then would be that we implement the work pool as an ordered
linked list (or other ordered structure), and when a board position is
added to the work pool, we insert it according to its distance from the
winning board. This way the processors will usually work on the more
promising boards, and thus hopefully reach the solution faster.

\section{Itemset Analysis in Data Mining}

\subsection{What Is It?}

The term {\bf data mining} is a buzzword, but all it means is the
process of finding relationships among a set of variables.  In other
words, it would seem to simply be a good old-fashioned statistics
problem.

Well, in fact it {\it is} simply a statistics problem---but writ large.
Instead of the tiny sample sizes of 25 you likely saw in your statistics
courses, typical sample sizes in the data mining arena run in the
hundreds of thousands or even hundreds of millions.  And there may be
hundreds of variables, in constrast to the, say, half dozen you might
see in a statistics course.  

{\bf Major, Major Warning:}  With so many variables, the chances of
picking up spurious relations between variables is large.  And although
many books and tutorials on data mining will at least pay lip service to
this issue (referring to it as {\bf overfitting}, they don't emphasize
it enough.\footnote{Some writers recommend splitting one's data into a
{\bf training set}, which is used to discover relationships, and a {\bf
validation set}, which is used to confirm those relationships.  However,
overfitting can still occur even with this precaution.}

Putting the overfitting problem aside, though, by now the reader's
reaction should be, ``This calls for parallel processing,'' and he/she
is correct.  Here we'll look at parallelizing a particular problem,
called {\bf itemset analysis}, the most famous example of which is the 
{\bf market basket problem}:

\subsection{The Market Basket Problem}

Consider an online bookstore has records of every sale on the store's
site.  Those sales may be represented as a matrix S, whose (i,j)th
element $S_{ij}$ is equal to either 1 or 0, depending on whether the
i$th$ sale included book j, i = 0,1,...,s-1, j = 0,1,...,t-1.  So each
row of S represents one sale, with the 1s in that row showing which
titles were bought.  Each column of S represents one book title, with
the 1s showing which sales transactions included that book.  

Let's denote the entire line of book titles by $T_0,...,T_{b-1}$.  An
{\bf itemset} is just a subset of this.  A {\bf frequent} itemset is one
which appears in many of sales transactions.  But there is more to it
than that.  The store wants to choose some books for special ads, of the
form ``We see you bought books X and Y.  We think you may be interested
in Z.''

Though we are using marketing as a running example here (which is the
typical way that this subject is introduced), we will usually just refer
to ``items'' instead of books, and to ``database records'' rather than
sales transactions.

We have the following terminology:

\begin{itemize}

\item An {\bf association rule} $I \rightarrow J$ is simply an ordered
pair of disjoint itemsets I and J. 

\item The {\bf support} of an an association rule $I \rightarrow J$ is
the proportion of records which include both I and J.  

\item The {\bf confidence} of an association rule $I \rightarrow J$ is
the proportion of records which include J, {\it among those records
which include I}.

\end{itemize}

Note that in probability terms, the support is basically P(I and J)
while the confidence is P(J$|$I).  If the confidenc the book business,
it means that buyers of the books in set I also tend to buy those in J.
But this information is not very useful if the support is low, because
it means that the combination occurs so rarely that it's not worth our
time to deal with it.  

So, the user---let's call him/her the ``data miner''---will first set
thresholds for support and confidence, and then set out to find all
association rules for which support and confidence exceed their
respective thresholds.

\subsection{Serial Algorithms}

Various algorithms have been developed to find frequent itemsets and
association rules.  The most famous one for the former task is the {\bf
Apriori} algorithm.  Even it has many forms.  We will discuss one of the
simplest forms here.

The algorithm is basically a breadth-first tree search.  At the root we
find the frequent 1-item itemsets.  At the second level, we find the
frequent 2-item itemsets, and so on.  After we finish with level i, we
then generate new candidate itemsets of size i+1 from the frequent
itemsets we found of size i, by   

The key point in the latter operation is that if an itemset is not
frequent, i.e. has support less than the threshold, then adding further
items to it will make it even less frequent.  That itemset is then
pruned from the tree, and the branch ends.

Here is the pseudocode:

\begin{tabbing}
set $F_1$ to the set of 1-item itemsets whose support exceeds the
threshold \\
for \=  i = 2 to b \\
  \> $F_{i} = \phi$ \\  
  \> for \= each I in $F_{i-1}$ \\
  \> \> for \= each K in $F_1$ \\
  \> \> \> $Q = I \cup K$ \\
  \> \> \> if \= support(Q) exceeds support threshold \\
  \> \> \> \> add Q to $F_{i}$ \\
  \> if $F_{i}$ is empty break \\
return $\cup_i F_i$
\end{tabbing}

Again, there are many refinements of this, which shave off work to be
done and thus increase speed.  For example, we should avoid checking the
same itemsets twice, e.g. first \{1,2\} then \{2,1\}.  This can be
accomplished by keeping itemsets in lexicographical order.  We will not
pursue any refinements here.

\subsection{Parallelizing the Apriori Algorithm}

Clearly there is lots of opportunity for parallelizing the serial
algorithm above.  Both of the inner {\bf for} loops can be parallelized
in straightforward ways; they are ``embarrassingly parallel.''  There
are of course critical sections to worry about in the shared-memory
setting, and in the message-passing setting one must designate a manager
node in which to store the $F_i$.

However, as more and more refinements are made in the serial algorithm,
then the parallelism in this algorithm become less and less
``embarrassing.''  And things become more challenging if the storage
needs of the $F_i$, and of their associated ``accounting materials''
such as a directory showing the current tree structure (done via
hash trees), become greater than what can be stored in the memory of one
node.

In other words, parallelizing the market basket problem can be very
challenging.  The interested reader is referred to the considerable
literature which has developed on this topic.