%% This LaTeX-file was created by <root> Wed Feb  9 20:44:00 2000
%% LyX 1.0 (C) 1995-1999 by Matthias Ettrich and the LyX Team

%% Do not edit this file unless you know what you are doing.
\documentclass{article}
\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}
\IfFileExists{url.sty}{\usepackage{url}}
                      {\newcommand{\url}{\texttt}}

\makeatletter


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
\providecommand{\LyX}{L\kern-.1667em\lower.25em\hbox{Y}\kern-.125emX\@}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}

\makeatletter


\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}

\makeatletter


\usepackage[T1]{fontenc}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}

\makeatletter


\usepackage[T1]{fontenc}


\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0.5in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.0in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.05in}



\makeatletter

\makeatother
\makeatother
\makeatother
\makeatother

\begin{document}


\title{Software Distributed Shared Memory Systems}


\author{Norman Matloff}


\date{February 8, 2000}

\maketitle

\section{Introduction}

The general consensus in the parallel processing community is that the shared-memory
programming paradigm is easier to program in than the message-passing paradigm.
Yet shared-memory hardware is quite expensive, costing hundreds of thousands,
or even millions, of dollars. By contrast, message-passing hardware is readily
available in the form of ordinary networks of workstations. The goal of software
distributed shared memory (SDSM) is to get ``the best of both worlds''--the
ease of programming of the shared-memory paradigm, and the attractive price-performance
ratio of networks of workstations. The SDSM programmer has the illusion of shared
memory, and thus the clarity, without the expense of real shared-memory hardware.

With shared-memory hardware, a shared variable is truly shared; it represents
a single, physical memory location accessible by all processors.\footnote{%
This is not quite true, since even shared-memory hardware machines usually have
caches at each processor, so there really isn't a single copy after all.
} In an SDSM environment, the sharing is only virtual; each processor ``secretly''
has its own copy of a given shared variable, though this is supposed to be transparent
to the programmer (or, at least, as transparent as possible). Some mechanism
is needed to keep the various copies consistent with each other; this is done
by communication between the nodes, unseen by the application programmer.


\section{Some Infrastructure}


\subsection{Memory Allocation}

Shared-memory systems, whether hardware- or software-based, often allocate memory
by placing all global variables inside a huge \textbf{struct}. As an example,
suppose we wish to have these global (thus shared) variables:

\begin{verbatim} 
int x,y;
int z[1000];
char u,v;
\end{verbatim}

In many SDSMs and even many hardware-based shared-memory machines, we would
probably have to change this to:

\begin{verbatim} 
struct glob {
   int x,y;
   int z[1000];
   char u,v;
} *mem;
...
mem = malloc(sizeof(glob));
\end{verbatim}

Then instead of statements like

\begin{verbatim}
x = 12;
\end{verbatim}

we would have

\begin{verbatim}
mem->x = 12;
\end{verbatim}

Of course, one could make the code less cluttered by using a trick like this:

\begin{verbatim}
#define X mem->x
...
X = 12;
\end{verbatim}


\subsection{Cache Coherency}

Just as hardware-based shared-memory machines need to have cache coherency mechanisms,
SDSMs do too. A typical model would be a network of workstations, each machine
running both a copy of the application program and an SDSM server program. Each
global variable would have a copy at one or more of the servers. The application
programs communicate with the servers to read and write shared variables, and
the servers are written to maintain coherency with each other's copies. The
standard cache coherency protocols, e.g. MESI, can be used here.

For our purposes, it is easiest to assume that the application program and the
server at a given node are the same program, say running as separate processes
generated by a call to fork() from a parent program. This is to enable easy
communication and sharing between the application and the server, but other
models are possible too.

The two main types of SDSMs are \textbf{paged-based} and \textbf{object-based}.


\section{Page-Based SDMs}

Page-based SDMs rely on the hardware's virtual memory system. The best-known
example is probably Treadmarks (\url{http://www.cs.rice.edu/~willy/TreadMarks/overview.html}),
developed at Rice University. Others include JIAJIA (\url{http://www.ict.ac.cn/chpc/dsm/index.html})
at the Academy of Sciences in China.


\subsection{View Seen by the Application Programmer}

For the most part, an application program in this kind of SDSM will look just
like one on a shared-memory machine.


\subsection{Internal Implementation}

Here we make use of the mprotect() Unix system call. This function allows the
Unix programmer to declare certain memory areas in his/her program as having
READ(-only) permission, READ/WRITE permission or permission NONE. If the program
executes a statement violating the permission for a given page in memory, a
SIGSEGV signal is generated, which triggers execution of the Unix signal handler
which the programmer wrote in the program.

In general, the mprotect() function is used for safety. If the programmer knows
that he/she wants certain variables to be read-only, then if a bug in the program
causes such a variable to be written to, the error will be caught.

For an SDSM, we use mprotect() with a different goal. Initially, we set all
pages within the global \textbf{struct} to have permission NONE. This is analogous
to state Invalid in MESI. The first time a program accesses a global variable,
that will cause a page fault, which transfer control to the signal handler---which
is the server. The server then provides the application with the contents of
the given page, and sets the MESI state accordingly, again using a call to mprotect().
The servers are also in communication with each other, to maintain coherency.


\section{Object-Based SDSMs}

One of the problems with page-based SDSMs (and, for that matter, also with caches
in shared-memory hardware machines) is that of \textbf{false sharing}. Here
our concern is that two variables with no relation at all to each other have
the misfortune of being in the same page. If one of the two variables becomes
invalid due to having been written to at another node, the other variable becomes
invalid too, even though its data is in fact still perfectly good. Object-based
SDSMs allow MESI states for units of data which are smaller and of varying sizes,
thus eliminating the false-sharing problem. In our example here, though, we
will assume for the sake of simplicity that objects are of only one size, specifically
one word.  

A number of object-based DSMs are available, such as the Adsmith system (\url{http://www.hensa.ac.uk/parallel/environments/pvm3/adsmith/})
developed at Taiwan National University.


\subsection{View Seen by the Application Programmer}

The main new feature here is that before accessing a variable, a programmer
must ``ask permission'' before reading or writing a shared variable. For example,
instead of

\begin{verbatim}
mem->x = 3;
\end{verbatim}

we would have something like

\begin{verbatim}
get_write_permission(&mem->x);
mem->x = 3;
release_write_permission(&mem->x);
\end{verbatim}


\subsection{Internal Implementation}

The permission functions are executed by the servers, thus enabling coherency
operations.

\end{document}

