Norman Matloff's LAM MPI Tutorial

Norm Matloff's LAM MPI Tutorial

Professor Norm Matloff
Dept. of Computer Science
University of California at Davis
Davis, CA 95616

(Please mail any questions to Norm Matloff.)

(You may wish to also read my general MPI tutorial, which is a full chapter in my open-source book on parallel programming.)

Contents:

Overview:

LAM has quite a following as an alternative to MPICH. More recently, it has evolved to OpenMPI (not to be confused with OpenMP).

Installation:

If you wish to install LAM, say on your Linux box, download from the LAM home page, and do

configure --prefix=directory
make
make install

where "directory" is the directory in which you want LAM installed. (Omit this option if you wish to take the default.)

Some other options to configure you may wish to use include --without-fc for no FORTRAN support, --with-pic to (try to) generate only position-independent code, and so on. Type

configure --help

to see all the options.

You may find that the configure script complains, incorrectly, that your compiler doesn't recognize the bool type. To fix this, edit configure, changing the lines

int main(int argc, char *argv)

to

int main(int argc, char **argv)

and similarly for

int main(int argc, char* argv)

Environment and path:

Your path must include the bin subdirectory of whatever you set "prefix" to above, so that LAM executables such as lamboot will be found. (Note: If you have both LAM and and some other implementation of MPI, say MPICH, installed, you'll have to do something to avoid confusing them for the programs mpicc and mpirun, which both have.)

Please note that MPI implementations typically work by using ssh or equivalent to invoke programs on other nodes. Thus the proper path must be set up by your shell startup file, not just via a temporary resetting of path.

The same is true for library path.

Later, when you write LAM application programs, make sure that their executable files are in directories listed within your $PATH.

If you are running on several machines having a shared file system, your programs's iniital working directory will be the one that you invoked mpirun from.

You'll need to have the LAM library accessible to your MPI application's executable file. You can try setting your LD_LIBRARY_PATH environment variable, e.g.

setenv LD_LIBRARY_PATH /usr/local/lam/lib

Or you can specify the library directory when you run the appplication, e.g.

mpirun -c 3 -x LD_LIBRARY_PATH prmpp -- 100 0

which tells MPI to set LD_LIBRARY_PATH at the remote machines to whatever it is set to on the local machine.

You may also need to set the environment variable for your remote shell command, e.g.

setenv LAMRSH ssh

or

setenv LAMRSH "ssh -Y"

"Booting" LAM:

The LAM version of MPI requires you to first "boot" LAM, using the lamboot command. Using rsh or ssh, lamboot will start lamd, the LAM message server, at each machine you will be using in your current LAM session. (Of course, you may wish to run several LAM processes on the same machine too.) The advantage of this is that all the connections are now ready, so that if you run many LAM applications from now on, each one avoids the overhead of starting up the connections.

We will assume here that the remote shell command on your systems is ssh. Set up ssh to allow passwordless access to those machines from the one at which you start up your LAM application. Make sure to do an actual passwordless login to those machines before you try to run MPI. To arrange the passwordless login on UCD CSIF machines (or others, actually), see these instructions.

Set up a boot schema file, listing the nodes you will use. Here is an example:

pc8.cs.ucdavis.edu
pc10.cs.ucdavis.edu
pc12.cs.ucdavis.edu

Node 0 will be pc8, node 1 will be pc10, etc.

The machine from which you are running lamboot must be among the machines in the partition.

If you wish to run more than one LAM process on the same machine, e.g. for multicore machines, use the cpu option, e.g.

pc28.cs.ucdavis.edu cpu=2
Then start up LAM:

lamboot -v your_boot_schema_file_name

LAM is now running, as lamd, the LAM daemon, on all the machines you specified. You can now run LAM programs as many times as you like, without rerunning lamboot, until you use the lamhalt command (or the machine is rebooted).

If it failed: First, if you just changed to a passwordless login, try running lamboot again. Second, make sure you've followed all the path and environment instructions above exactly.

Note that localhost may be used to specify the machine on which you run lamboot.

You can use lamnodes and laminfo to check on what is running.

Compiling a LAM MPI application:

Type

mpicc -g -o binary_file_name source_file.c 

(If you wish to use C++, use mpiCC instead of mpicc.)

As mentioned earlier, make sure that the resulting executable is in a directory listed within your $PATH.

Running a LAM MPI application--basics:

To describe how to run a LAM program, I will assume that it is of the "SPMD" ("single-program, multiple data") type. This means that the same program runs on all nodes, though typically accessing different data. An example is our MPI sample program, which is of SPMD type. (We could convert it to non-SPMD ("MPMD") type by forming a "master" program from all the code which the current version assigns to node 0, and a "slave" program consisting of the rest of the code.)

Ordinarily one runs a LAM application program by invoking it from the mpirun command. For example, to run our MPI sample program,,m prime, we would type

mpirun -c 3 prime -- 100 0

which runs the program prime on three MPI processes. The command-line arguments intended for the application program go at the end of this line, in this case "100" and "0", which will be argv[1] and argv[2].

What mpirun will do is contact the first k nodes from your boot schema file, where "-c k" is the portion of your command line which specifies the number of nodes you wish to run on. Using ssh, it will start your application program, e.g. prime above, on each of the nodes, with the command-line arguments you specify after "--".

Running a LAM MPI application--high performance:

"Standard" LAM uses TCP/IP for sending its messages. However, on some platforms, other methods may be considerably faster, such as if

Of course, that second situation is now quite common. To achieve better performance, select an RPI module specific to your system. For the multiprocessor case, try (in the prime example) either

mpirun -c 3 -ssi rpi sysv prime -- 100 0

or

mpirun -c 3 -ssi rpi usysv prime -- 100 0

This will exploit shared memory in messages between processes on the same machine, while using TCP for messages between machines.

The default is equivalent to

mpirun -c 3 -ssi tcp usysv prime -- 100 0

This can also be set via an environment variable, e.g.

setenv LAM_MPI_SSI_rpi usysv

Shutting down LAM:

 lamhalt boot_schema_file 

The lamhalt command, again using rsh or ssh, kills the lambd daemons at all the machines from which you booted LAM. Running lamhalt without any command-line arguments results in killing all your daemons.

Output from LAM:

Note carefully that all the output from printf() calls will be collected by lamd at whichever machine you started it. These may be interspersed together from different nodes, making your output difficult or impossible to read. (Note: This interspersing of printf() outputs is quite common in parallel systems.)