Professor Norm Matloff
Dept. of Computer Science
University of California at Davis
Davis, CA 95616
(Please mail any questions to Norm Matloff.)
(You may wish to also read my general MPI tutorial, which is a full chapter in my open-source book on parallel programming.)
Contents:
If you wish to install MPICH/MPICH2 yourself, say on your Linux box, download the source code from the MPICH/MPICH2 home page. Just unpack the software and type
configure make make install
If you don't like the default installation directory, add a -prefix option too.
Make sure your shell search path includes the directory containing mpicc, as well as the directory in which your MPICH/MPICH2 application executable will reside.
Please note that MPI implementations typically work by using ssh or equivalent to invoke programs on other nodes. Thus the proper path must be set up by your shell startup file, not just via a temporary resetting of path.
The same is true for library path.
If you later get "missing library" messages, set the environment variable LD_LIBRARY_PATH to the lib/ directory of your MPICH/MPICH2 installation.
Type
mpicc -g -o binary_file_name source_file.c
(If you wish to use C++, use mpixx instead of mpicc.)
(If you are using MPICH2, the directions for running are here.)
To describe how to run an MPICH program, I will assume that it is of the "SPMD" ("single-program, multiple data") type. This means that the same program runs on all nodes, though typically accessing different data. An example is our MPI sample program, which is of SPMD type. (We could convert it to non-SPMD ("MPMD") type by forming a "master" program from all the code which the current version assigns to node 0, and a "slave" program consisting of the rest of the code.)To run our MPI sample program, prime, we would set up a procgroup ("process group") file, listing the nodes we will use and program(s) we will run, say the following file My.pg:
pc8.cs.ucdavis.edu 0 /home/matloff/tmp/prime pc10.cs.ucdavis.edu 1 /home/matloff/tmp/prime pc12.cs.ucdavis.edu 1 /home/matloff/tmp/prime
(Use 0 for the first machine, and 1 for all others.)
If your machines are using rsh, make absolutely sure that your $HOME/.rhosts files on these machines include the names of these machines. If the machines are using ssh, make sure that you've set things up for passwordless remote execution. Make sure to do an actual passwordless login to those machines before you try to run MPI. To arrange the passwordless login on UCD CSIF machines (or others, actually), see these instructions.
Also make sure there are no undefined variables in your .cshrc startup ($TERM may be one). Node 0 will be pc8, node 1 will be pc10, etc.
Then from your node-0 machine type
mpirun -p4pg My.pg prime 100 0
at node 0, in this case pc8. (Make sure you do this at node 0.)
The key to running an MPICH2 application is the mpd daemon, one of which must run on each machine to be used by your program.
Make sure you have a file .mpd.conf in your home directory, with a line like
secretword=
with your favorite word following.
The primitive way to launch the daemons is to simply typempd &
in a terminal window at each machine. If you want to run k daemons on a machine, type
mpd --ncpus=k &
But for multiple machines, it is more convenient to use mpdboot. To do this, first set up a file mpd.hosts in some directory, with the names of the machines on which you want daemons to be running; list the network names of the machines, one line per machine, e.g.
pc29.cs.ucdavis.edu pc30.cs.ucdavis.edu
Then type
mpdboot --totalnum=2 --verbose
(replace the 2 by the number of MPI processes you wish to establish).
If it failed: First, if you just changed to a passwordless login, try running mpdboot again. Second, make sure you've followed all the path and environment instructions above exactly. MPICH includes some good troubleshooting programs, mpdcheck and mpdtrace; see the MPICH2 installation guide for step-by-step instructions.
If you want to have more than one mpd process on a given machine, you need to type
mpdboot --totalnum=-1 --ncpus=k
where k is the total number of MPI processes you wish to run.
To run your program, the above command for MPICH
mpirun -p4pg My.pg prime 100 0
becomes under MPICH2
mpiexec -l -n 3 prime 100 0
(There are also other ways.)
You can run mpdallexit to shut down the daemons.
Note carefully that all the output from printf() and the like will be collected and printed at whichever machine you started it. These may be interspersed together from different nodes, making your output difficult or impossible to read. (Note: This interspersing of printf() outputs is quite common in parallel systems.)
Don't use printf() calls for most of your debugging. Your debugging will be much easier and faster if you use a debugging tool, such as gdb. To use gdb with MPICH/MPICH2, follow the directions given in my parallel debugging guide, at http://heather.cs.ucdavis.edu/~matloff/pardebug.html.