UCTuplets: A Linda Programming Library and More

Professor Norm Matloff
Dept. of Computer Science
UC Davis
530-752-1953
matloff@cs.ucdavis.edu

The UCTuplets package is an implementation of the Linda parallel programming paradigm, featuring functionality on two levels:

The base version, which is a clean (though not necessarily fast) implementation of the Linda paradigm for parallel processing.
The research version, for experimentation with the concept of intelligent tuples (I-tuples).

Both packages run on networks of Unix workstations of a given architecture (e.g. a network of SPARCstations). The user writes in C or C++ and makes library calls to the UCTuplets system.

Platforms:

The base version should run on almost any Unix machines connected through network (TCP/IP) access. Note, however, that no provision is made for translating data between little- and big-endian machines. Thus the machines must be either all little-endian or all big-endian. Also, the package can be run by invoking one's UCTuplets application program several times on the same machine, say for development purposes.

The research version.has been tested successfully on Linux, SPARC, HP and SGI platforms. It must be run on a homogeneous set of workstations, e.g. all Linux PCs or all SPARCs, using the same compiler on all machines, as it depends on functions within the application program having the same (virtual) address in each instance of the program.

Download the software:

Just type "make" to build either package. On SPARCs, first do an include of in UCTFunc.c.

How to write an UCTuplets application program:

include files and UCT variables
program format
tuple format and operations

Include files and UCT variables:

Your program must include a line

#include <UCTInclude.h>

That line will make the following variables available to your program:

char UCTTMName[MAX_TM_LENGTH];// tuple manager node name
int  UCTPortNum,              // port number for tuple manager
     UCTNodeNum,              // number of this node
     UCTNWorkNodes;           // total number of work nodes
int  UCTArgc;                 // replacement for argc
char **UCTArgv;               // replacement for argv[]

Of these, UCTNodeNum, UCTWorkNodes (total number of machines minus the 1 tuple manager node), UCTArgc, and UCTArgv are the ones of main use. Note that the latter two pertain only to the arguments which are for the application itself. Thus for example argv[5] for the program as a whole is identical to UCTArgv[1].

Program format:

Programs are written in SPMD (single program, multiple data) form, meaning that all nodes other than node 0 execute the same program. Even node 0, which is the tuple manager and thus does not do computation, runs the same program. Say the application source code file is x.c and the executable is x, and we have 5 nodes. Then x runs on all 5 nodes, the only difference being that on node 0 x calls the tuple manager function, while at the other nodes x calls Worker(), which does the computation of the application.

The Worker() function, written by the user, is required. This is where the work of the application is performed. It should be of type int, with no arguments.

The body of the user's main() function should consist of just one line:

UCTInit(argc,argv);

UCTInit() will call the tuple manager in the case of node 0, and call Worker() in the other cases.

Tuple format and operations:

The usual Linda in(), out() and rd() functions are available. Tuple components can be of type string, int, float, pointer or int/float array.

The first component of any tuple specification in calling these functions, which is required to be of type string, will specify the types of the other components, using this scheme:

s:  string
i:  int
f:  float
p:  pointer to int or float
I:  int array
F:  float array

In a call to in() or rd(), components of a tuple specification are either matching or nonmatching, i.e. are used in specifying criteria for finding a matching tuple in the tuple space. The p, I and F types are used to capture the values of nonmatching components (instead of using the `?' operator of classical Linda). For example, the call

in("sip","abc",X,&N);

would direct the tuple manager to find a 3-component tuple whose first component is "abc", and whose second component is equal to the current value of X. The value of the third component would be copied to N. In a call to out(), the p type would not be used.

An I or F component must be preceded by an int component in the case of a call to out(), the int component being used to store the length of the array, and be preceded by a pointer component in the case of a call to in() or rd(), with the length of the array being placed in the location specified by the pointer upon return from the call. Note that for in() and rd(), an I or F field is considered to be a pointer and thus nonmatching; matching is not allowed for arrays.

For example:

int Y[1000],N;

in("spI","def",&N,Y);  // get array and its length
// do some work on Y
out("siI","def",N,Y); // put array back in the tuple space

Note: Apparently the array Y above must be local, not global. (This seems to be due to some problem with use of varargs.)

How to write an intelligent-tuples program:

The rules here are the same as for the base version, except that the first code in the type-specifying component of a tuple must be either `r' ("remote") or `l' ("local"). The remote operation simply means the base system, while the local operation indicates intelligent-tuple operation, as follows:

The main idea of intelligent tuples is for worker nodes to offload some of their work to the tuple server. This is not motivated by situations in which the node running the tuple server is especially fast; instead, the point is to avoid communications costs.

To illustrate this idea, consider a simple example in which we have a counter which must be incremented every time a worker node reaches a certain spot in the program. In other implementations of Linda, including the base version of UCTuplets, one would do the incrementing with code something like:

in("sp","count",&I);
out("si","count",I);
I++;

The problem with this is that it requires two trips through the network, thus double the communication overhead. It would be much nicer if a worker node could simply ask the tuple manager to do the incrementing of the "count" tuple, returning the old value to the worker node. The idea of intelligent tuples is to allow such an operation, using a call to the UCTuplets library function tmexec() ("tuple manager execute"). In the first argument in that call, the user would specify a user-written function which the tuple manager is to execute. The user-written function will consist mainly of calls to in(), out() and rd(), all with the `l' prefix, i.e. local.

The function tmexec() itself has the following call structure (the following line is part of UCTInclude.h):

void tmexec (void*, char*, int, char*, int*);

The first argument is the address of the user-specified function; the second argument is an array of input arguments for that user function; the third is the length of that array in bytes; the fourth is an array of output arguments for the user function; the fifth is a pointer which will, after the call to tmexec(), point to the length of the output array.

Here is how our example above could be written. First, our user-specified function would be something like:

void FetchAndAdd(int * Ins, int InsL, int * Outs, int * OutsL)

{  int I;

    in("lsp","count",&I);
    memcpy(Outs,I,sizeof(int));
    *OutLengthPtr = sizeof(int);
    out("lsi","count",++I);
}

To command the tuple manager to execute this function, we call tmexec():

tmexec(&FetchAndAdd,Ins,InsL,Outs,&OutsL);
memcpy(I,Outs,*OutsL);

The key point is that the in() and out() calls in FetchAndAdd() will be done locally by the tuple manager, thus saving network communication.

How to compile and run an application program:

Use the Unix set command to set UCTDIR to the UCTuplets directory (which has bin, lib and include as subdirectories). (It is assumed here that either the base version or the research version of UCTuplets is installed here, not both.) To compile a base-version application, say x.c, type:

gcc -g -o x x.c -I$UCTDIR/include -L$UCTDIR/lib -luct

On SPARCs, add -lsocket to the compile command line.

For a research-version application, use g++ instead of gcc.

The format for running an UCTuplets application "by hand" (as opposed to via the UCTRun script, described below) is

app_name t_mgr_node port_num node_num tot_nodes app_args

where:

t_mgr_node is the name of the machine which will serve as tuple manager
port_num is any TCP port number above 1024 (see more detail below)
node_num is the index of the given machine, starting with 0 for the tuple server
tot_nodes is the total number of machines, including the tuple server
app_args are the command-line arguments intended for the application program

Say x has arguments 10 and 4. Then to run x, say at machines u, v and w, on port 1991, we would type:

x u 1991 0 3 10 4

on machine u, and then type the same thing at v and w except that the 0 would be replaced by 1 and 2, respectively.

Some more detail on the port number: You do not have to know TCP/IP or know what a port number is in order to run UCTuplets. Just choose a number above 1024, and run the netstat command, say as "netstat --inet" if that option exists, and scan through it to make sure the port number you wish to use does not appear anywhere in the output. Don't worry about what the numbers mean.

Don't reuse the same port number within a 3-4 minute period.

UCTRun, in the bin subdirectory, can be used to avoid having to manually start the UCTuplets application at each machine. Its command-line format is

UCTRun machine_file port_number app_name app_args

where machine_file is a text file listing the machines to be used (u, v, and w in the example above), each on a separate line. You would run the script from u. Make sure that you have set up your .rhosts file for this. In the case of the research version of UCTuplets, you may need to edit the first line of UCTRun to indicate the full path for perl on your system.

Debugging:

One can attach gdb to running UCTuplets processes, in the manner described in my page on parallel debugging for MPI and similar packages. However, a nice aspect of UCTuplets is that it is easy to simply start each UCTuplets manually via gdb to begin with.

Authors:

Norman Matloff conceived the functionality and structure of the base and research versions of UCTuplets. The software was implemented by Michael Foster, David Standring and Robert Sweeney, who also designed and performed the initial empirical investigations.

UCTuplets: A Linda Programming Library and More

Contents: