Professor Norm Matloff
Dept. of Computer Science
University of California at Davis
Davis, CA 95616
(Please mail any questions to Norm Matloff.)
Contents:
If you wish to install MPICH/MPICH2 yourself, say on your Linux box, download the source code from the MPICH/MPICH2 home page. Just unpack the software and type
configure make make install
If you don't like the default installation directory, add a -prefix option too.
Make sure your shell search path includes the directory containing mpicc, as well as the directory in which your MPICH/MPICH2 application executable will reside.
Make sure that mpicc is in your search path. (Also, if you have other versions of MPI there, make sure the one you want comes first.)
mpicc -g -o binary_file_name source_file.c
(If you wish to use C++, use mpiCC instead of mpicc.)
(If you are using MPICH2, the directions for running are here.)
To describe how to run an MPICH program, I will assume that it is of the "SPMD" ("single-program, multiple data") type. This means that the same program runs on all nodes, though typically accessing different data. An example is our MPI sample program, which is of SPMD type. (We could convert it to non-SPMD ("MPMD") type by forming a "master" program from all the code which the current version assigns to node 0, and a "slave" program consisting of the rest of the code.)To run our MPI sample program, prime, we would set up a procgroup ("process group") file, listing the nodes we will use and program(s) we will run, say the following file My.pg:
pc8.cs.ucdavis.edu 0 /home/matloff/tmp/prime pc10.cs.ucdavis.edu 1 /home/matloff/tmp/prime pc12.cs.ucdavis.edu 1 /home/matloff/tmp/prime
(Use 0 for the first machine, and 1 for all others.)
If your machines are using rsh, make absolutely sure that your $HOME/.rhosts files on these machines include the names of these machines. If the machines are using ssh, make sure that you've set things up for passwordless remote execution. On UCD CSIF machines, see these instructions.
Also make sure there are no undefined variables in your .cshrc startup ($TERM may be one). Node 0 will be pc8, node 1 will be pc10, etc.
Then from your node-0 machine type
mpirun -p4pg My.pg prime 100 0
at node 0, in this case pc8. (Make sure you do this at node 0.)
The key to running an MPICH2 application is the mpd daemon, one of which must run on each machine to be used by your program.
Make sure you have a file .mpd.conf in your home directory, with a line like
secretword=
with your favorite word following.
The primitive way to launch the daemons is to simply typempd &
in a terminal window at each machine
But it is more convenient to use mpdboot. To do this, first set up a file mpd.hosts in some directory, with the names of the machines on which you want daemons to be running; list the network names of the machines, one line per machine, e.g.
pc29.cs.ucdavis.edu pc30.cs.ucdavis.edu
Then type
mpdboot
If you want to have more than one mpd process on a given machine, you need to type
mpdboot --totalnum=-1 --ncpus=k
where k is the total number of MPI processes you wish to run. (You can later run mpiallexit to shut down the daemons.)
To run your program, the above command
mpirun -p4pg My.pg prime 100 0
becomes (for instance)
mpiexec -l -n 3 prime 100 0
Note carefully that all the output from printf() will be collected and printed at whichever machine you started it. These may be interspersed together from different nodes, making your output difficult or impossible to read. (Note: This interspersing of printf() outputs is quite common in parallel systems.)
Don't use printf() calls for most of your debugging. Your debugging will be much easier and faster if you use a debugging tool, such as gdb. To use gdb with MPICH/MPICH2, follow the directions given in my parallel debugging guide, at http://heather.cs.ucdavis.edu/~matloff/pardebug.html.