Norman Matloff's LAM MPI Tutorial

Norm Matloff's LAM MPI Tutorial

Professor Norm Matloff
Dept. of Computer Science
University of California at Davis
Davis, CA 95616

(Please mail any questions to Norm Matloff.)

Contents:

Installation:

If you wish to install LAM, say on your Linux box, download from the LAM home page, and do

configure --prefix=directory
make
make install
where "directory" is the directory in which you want LAM installed.

Environment and Path:

Your path must include the bin subdirectory of whatever you set "prefix" to above, so that LAM executables such as lamboot will be found. (Note: If you have both LAM and MPICH installed, you'll have to do something to avoid confusing them for the programs mpicc and mpirun, which both have.)

Later, when you write LAM application programs, make sure that their executable files are in directories listed within your $PATH.

Set the environment variable for your remote shell command, e.g.

setenv LAMRSH ssh

"Booting" LAM:

The LAM version of MPI requires you to first "boot" LAM, using the lamboot command. Using rsh or ssh, lamboot will start lamd, the LAM message server, at each machine you will be using in your current LAM session. The advantage of this is that all the network connections are now ready, so that if you run many LAM applications from now on, each one avoids the overhead of starting up the network connections.

Make sure that your .rhosts file is set up to include all machines to be booted (if you are using rsh, or you have set up ssh to allow passwordless access to those machines from the one at which you start up your LAM application. To arrange the latter on UCD CSIF machines (or others, actually), see these instructions.

Set up a boot schema file, listing the nodes you will use. Here is an example:

pc8.cs.ucdavis.edu
pc10.cs.ucdavis.edu
pc12.cs.ucdavis.edu

Node 0 will be pc8, node 1 will be pc10, etc.

The machine from which you are running lamboot must be among the machines in the partition.

Then start up LAM:

lamboot -v boot_schema_file

LAM is now running, as lamd, the LAM daemon, on all the machines you specified. You can now run LAM programs as many times as you like, without rerunning lamboot, until you use the lamhalt command (or the machine is rebooted).

Compiling a LAM MPI Application:

Type

mpicc -g -o binary_file_name source_file.c 

(If you wish to use C++, use mpic++ instead of mpicc.)

As mentioned earlier, make sure that the resulting executable is in a directory listed within your $PATH.

Running a LAM MPI Application:

To describe how to run a LAM program, I will assume that it is of the "SPMD" ("single-program, multiple data") type. This means that the same program runs on all nodes, though typically accessing different data. An example is our MPI sample program, which is of SPMD type. (We could convert it to non-SPMD ("MPMD") type by forming a "master" program from all the code which the current version assigns to node 0, and a "slave" program consisting of the rest of the code.)

Ordinarily one runs a LAM application program by invoking it from the mpirun command. For example, to run our MPI sample program, "prime", we would type

mpirun -c 3 prime -- 100 0

which runs the program prime on three MPI processes. The command-line arguments intended for the application program go at the end of this line, in this case "100" and "0", which will be argv[1] and argv[2].

What mpirun will do is contact the first k nodes from your boot schema file, where "-c k" is the portion of your command line which specifies the number of nodes you wish to run on. Using rsh or ssh, it will start your application program, e.g. prime above, on each of the nodes, with the command-line arguments you specify after "--".

Shutting Down LAM:

When you are done using LAM, type

 lamhalt boot_schema_file 

The lamhalt command, again using rsh or ssh, kills the lambd daemons at all the machines from which you booted LAM. Running lamhalt without any command-line arguments results in killing all your daemons.

Output from LAM:

Note carefully that all the output from printf() calls will be collected by lamd at whichever machine you started it. These may be interspersed together from different nodes, making your output difficult or impossible to read. (Note: This interspersing of printf() outputs is quite common in parallel systems.)