%% This LaTeX-file was created by <root> Thu Mar  9 21:54:48 2000
%% LyX 1.0 (C) 1995-1999 by Matthias Ettrich and the LyX Team

%% Do not edit this file unless you know what you are doing.
\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}

\makeatletter


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
\providecommand{\LyX}{L\kern-.1667em\lower.25em\hbox{Y}\kern-.125emX\@}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}
\usepackage{graphics}

\makeatletter


\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}
\usepackage{graphics}

\makeatletter


\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}
\usepackage{graphics}

\makeatletter


\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}
\usepackage{graphics}

\makeatletter


\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}
\usepackage{graphics}

\makeatletter


\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0.5in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.0in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.1in}

\usepackage{times}
\usepackage{psfig}
\usepackage{fancyheadings}
\makeatother


\makeatother
\makeatother
\makeatother
\makeatother
\makeatother

\begin{document}


\title{Parallel Implementation of Combinatorial Algorithms}


\author{Norman Matloff}


\date{March 13, 2000}

\maketitle

\section{Overview}

Many applications of parallel processing methods involve \textbf{combinatorial
algorithms}. These involve searching of huge state spaces for an optimal, or
approximately optimal, solution to some problem.


\subsection{The 8 Queens Problem}

A famous example of this is the 8 Queens Problem, in which one wishes to places
eight queens on a standard 8x8 chessboard in such a way that no queen is attacking
any other. (The generalization, of course, involves n queens on an nxn board.)
Suppose our goal is to find all possible solutions.

To start a solution to this problem, we first note in any solution will have
the property that no row will contain more than one queen. This suggests building
up a solution row by row: Suppose we have successfully placed queens so far
in rows 0, 1, ..., k-1 (row 0 being the top row of the board). Where can we
place a queen in row k? Well, since we cannot use any column already occupied
by the preceding k queens, that means we have a choice of 8-k columns. But even
among those k columns, there will be j of them (\( 0 \) \( \leq j \) \( \leq 8-k) \)
that are in the diagonal attack path of some preceding queen. Then we can extend
our tentative k-row solution to8-k-j k+1 row solutions. 

For example, with k = 2 we may have built up a tentative solution as follows:
Define

\begin{verbatim} 
struct TentSoln {
   int RowsSoFar;
   int Cols[8];
   struct TentSoln *Next;
}
\end{verbatim}

The array Cols has the interpretation that if Col{[}I{]} == J means that we have
placed a queen in row I, column J of the board. RowsSoFar is the number of rows
in which we have placed queens so far.  Next points to the next item in
the work pool.

A parallel solution based on this idea would like something like this:

\begin{verbatim} 
while (work pool nonempty or at least one nonidle processor) {
   get a TentSoln struct from the work pool, and point P to it;
   I = P->RowsSoFar;
   for (J = 0; J < 8; J++) {
      if (a queen at row I, column J would not attack the previous queens) {
         Q = malloc(sizeof(struct TentSoln));
         Q->RowsSoFar+1;
         add the struct pointed to by Q to the work pool;
      }
   }
}
\end{verbatim}

There of course would also be code in the case I = 8 to check and see if we
have found a solution, and if so, to report it, etc.

Note that any rotation of a solution---interchanging rows and
columns---is also a solution.  Similarly, any reflection across one of
the two main diagonals of the board is also a solution.  This
information could be used to speed up computation, though at the
expense of additionality complexity of the code. 

\subsection{The 8-Square Puzzle Problem}

This game was invented more than 100 years ago. Here is what a typical board
position looks like:

\vspace{0.3cm}
{\centering \begin{tabular}{|c|c|c|}
\hline 
0&
5&
3\\
\hline 
1&
4&
\\
\hline 
7&
2&
6\\
\hline 
\end{tabular}\par}
\vspace{0.3cm}

(The real puzzle has numbering from 1 to 8, but we use 0 to 7.)

Each number is on a little movable square, which can be moved up, down,
left and right as long as the spot in the given direction is unoccupied.
In the example above, the square 3, for instance, could be moved
downward, producing an empty spot at the top right of the puzzle. The
object of the game is arrange the squares in ascending numerical order,
with square 0 at the upper left of the puzzle (which in this example
happens to be the case already).

We again solve this by setting up a work pool, in this case a pool of board
positions.  Each board position would be implemented in something like
this:

\begin{verbatim}
struct BoardPos   {
   int Row[9];
   int Col[9];
   struct BoardPos *Next; 
}
\end{verbatim}

Here Row[I] and Col[I] would be the position of the square numbered I.
For convenience, we also store the location of the position, in Row[8]
and Col[8]. 

Suppose a processor goes to the work pool and gets the board position
depicted above. In the simplest form of the algorithm, the processor
would check each of the three possible moves (4 right, 3 down, 6 up) to
see if the resulting board position would duplicate one that had already
been checked. All moves that lead to new positions would be added to the
work pool. Each processor would loop around, pulling items from the work
pool, until some processor somewhere finds a solution to the game (in
which case that processor would add termination messages to the work
pool, so that the other processors knew to stop).  An outline of the
algorithm would be as follows:

\begin{verbatim} 
while (work pool nonempty or at least one nonidle processor) {
   get a BoardPos struct from the work pool, and point P to it;
   for (I = 0; I < 8; I++)  {
      for all possible moves of square I do  {
         Q = malloc(sizeof(struct BoardPos));
         fill in *Q according to this move;
         if *Q has not already been checked
            add this board to the work pool;
      }
   }
}
\end{verbatim}

Again, code would need to be included for checking to see if a solution
has been found, whether we have found that no solution exists, and so on.

Note the operation

\begin{verbatim}
         if *Q has not already been checked
            add this board to the work pool;
\end{verbatim}

Clearly this is needed, to avoid endless cycling.  But it is not as
inoccuous as it looks.  If the set of all previously-checked board
positions is to be made available to all processors, this may produce
substantial increases in contention for memory and interprocessor
interconnects.  On the other hand, we could arrange the code such
that only certain processors have to know about certain subsets of
the set of previously-checked board positions, but this makes the
code more complex and may produce load-balancing problems.

A more sophisticated version of the algorithm would use a
\textbf{branch-and-bound} technique. The idea here is to reduce
computation by giving priority in the work pool to those board positions
which appear ``promising'' by some reasonable measure. For example, we
could take as our measure the ``distance'' between a given board
position and the goal board position, as defined by the sum of the
distances from each numbered square to its place in the winning
position.  In the example above, for instance, the square numbered 5 is
a distance of 2 from its ultimate place (2 meaning, one square to the
right, one square down, so 1+1 = 2). The board above is a distance 15
from the winning board.

The idea, then would be that we implement the work pool as an ordered linked
list (or other ordered structure), and when a board position is added to the
work pool, we insert it according to its distance from the winning board. This
way the processors will usually work on the more promising boards, and thus
hopefully reach the solution faster.

\end{document}

