Adsmith User Interface
William W. Y. Liang Department of Computer Science and Information Engineering
National Taiwan University, Taipei, Taiwan
E-mail: wyliang@orchid.ee.ntu.edu.tw
Chung-Ta King Department of Computer Science National Tsing Hua University, HsinChu, Taiwan
E-mail: king@cs.nthu.edu.tw
Feipei Lai Department of Electrical Engineering and Department of Computer Science and Information Engineering
National Taiwan University, Taipei, Taiwan
E-mail: king@cs.nthu.edu.tw
1 Introduction Adsmith [1] is an object-based DSM [2], which is built completely on top of PVM [3]. In Adsmith, the memory is viewed as consisting of many objects, which can be shared among multiple process on different processors. Adsmith provides primitives to create and allocate shared objects, accesses to shared objects, and operations to synchronize among processes.
Adsmith presents to the users as a user level library, which is implemented in C++. It can be viewed as adding a DSM layer on top of PVM. Both the PVM message-passing library and the Adsmith shared-memory library are accessible at the same time. This document describes the primary functions provided in Adsmith and its programming conventions. Advanced primitives for performance optimization are also introduced. optimization.
One example of Adsmith is shown in Appendix A. The complete Adsmith interface is listed in Appendix B. More information on Adsmith can be retrieved from Adsmith homepage:
http://archiwww.ee.ntu.edu.tw/,wyliang/adsmith.
1
2 Prologue Before getting started, a few things should be noted. These include the process and host identification, the memory model, and the access style.
2.1 Process and Host Identification Adsmith uses PVM to manage the processes. Traditionally, PVM users use the task id to identify a process. Under the DSM model, it is generally not necessary to directly communicate between processes. But in some cases, it may be useful to identify the hosts or the processes, for example, to allocate a shared object to a specified host. The host and process identification in Adsmith is simple. Each host is assigned a unique integer from 0 to the number of hosts minus one. Similarly, each host is assigned a unique integer from 0 to the number of processes minus one.
2.2 Memory Model Release consistency (RC) [5] is implemented in Adsmith. Shared accesses in RC are classified as competing accesses (special accesses) and non-competing accesses (ordinary accesses). Competing accesses mean that two or more accesses may refer to the same shared memory location at the same time and at least one is a write access. Special accesses are further categorized as synchronization accesses and non-synchronization accesses. Nonsynchronization accesses are competing accesses which are not used for synchronization purposes. Synchronization accesses are further divided into acquire accesses and release accesses. Adsmith provides all these access operations. (See Section 6.) Programmers are responsible for writing properly-labeled programs [5] by utilizing these operations.
2.3 Access Style Adsmith adopts a load/store-style memory accesses. When an object is allocated, an address is returned. The address points to a buffer in local memory. The time between executing an acquire and its immedidate following release in one process is called an interval. Before an object is first read during an interval, a load operation must be called to get its up-to-date data from the DSM system to the local buffer. At the end of the interval, if the object was modified, a store operation must be called to write its data back to the DSM system. Other accesses to the object during the interval can be performed locally through the buffer. In this way, most references in an interval are local. Section 5 will give more the details of accessing a shared object.
3 System Initialization With Adsmith, programmers need not do any initialization or termination explicitly. All these works are done automatically via the object initialization facilities provided by C++. In C++, before a program begins, the constructors of all global objects will be invoked.
2
Similarly, the destrctors are invoked when the program ends. Teh Adsmith library also contains a system object, which is responsible for initialization and termination of the whole system.
4 Process Creation Two forms of process creation are provided:
int adsm spawn( char *file, int count );
int adsm spawn( char *file, char **argv = 0, int flags = 0, char *where = 0,
int count, int *tids = 0 );
In the above declaration, file is the program name and count is the number of processes to create. The second form has the same set parameters as that of pvm spawn() [4]. Here is an example which creates 10 processes from the program "sample".
adsm.spawn("sample",10);
Although PVM has provided the function pvm spawn(), we require that child processes in Adsmith be created through adsm spawn(). In this way, proper system information will be collected during the process creation time.
Adsmith maintains a tree-like parent-child relationship among processes. The parent process will not terminate until all its children have terminated. There are functions in Adsmith which allow a parent process to explicitly wait for its children to terminate. There are three forms of such wait functions:
int adsm wait( ); int adsm wait( int tid ); int adsm wait( int *tids, int num );
The first function will cause the caller, i.e. the parent, to wait for all its children to terminate. The second specifies a particular child to wait. The tid is an integer PVM task id returned by the second form of adsm spawn(). The last form specifies a list of processes to wait.
5 Shared Ob ject Allocation and Basic Manipulation In Adsmith shared objects must be allocated before they are used. The allocation functions are used similarly as the malloc() function in C language. Three forms of the allocation function are supported:
void *adsm malloc( char *identifier, int size, int hint = AdsmDataDefault ); void *adsm malloc( char *identifier, int size, void *init, int hint = AdsmDataDefault );
3
void *adsm malloc( char *identifier, int size, void *initdata, short home,
short hint = AdsmDataDefault );
In the declaration, size is the size in bytes of the shared object and identifier is the string name to refer to the shared object. The parameter, init, in the last two forms is used to set the initial value for that shared object. Specifying init as NULL makes the shared object uninitialized. In the third form, the parameter home allows the user to specify the host where the shared object is to be allocated. However this parameter is only a hint. The object may be allocated on a host other than that specified, depending on the actual situation. The range of home is from 0 to the number of hosts minus one.
Several options can be set through the hint parameter to affect the access behavior of a shared object. Currently, the value of hint may be AdsmDataCache, AdsmDataLocal, AdsmDataUpdate, and AdsmDataMWriter. AdsmDataCache means that the shared data will be cached in application processes and managed through the coherent protocol. AdsmDataLocal means that the shared object will be allocated on the local host. This is the same as specifying the host id through the third form. When most accesses of the object are performed by the local processes, AdsmDataLocal can be used to increase the locality. AdsmDataUpdate means that write-update will be used as the coherence protocol for the declared object. By default, write-invalidate will be used.
The last value is AdsmDataMWriter. It tells the system that the Multiple Writer Protocol [6] will be applied on the declared shared object. This protocol allows several processes to write to different parts of the shared object at the same time. In this way the falsesharing problem can be eliminated and the critical section manipulation can be avoided, which is normally used to serialize accesses to the same object by multiple processes. The only restriction is that the user must ensure that at any time, no one part of the shared object is written by more than one process simultaneously.
All these values can be set together by the or operation in C language. Similar to the home parameter, hint serves only as a reference. The actual value will be set by the first declaration of an object, and can not be changed.
After a shared object has been allocated, an address will be returned, which points to the buffer space of the shared object. There are some differences between the concepts of "buffer" and "cache" in Adsmith. Every shared object has a buffer space in the process which has allocated the object. Data in a buffer will be up-to-date only when the acquire access is performed, according to the definition of Release Consistency. Without caching every load operation needs to be forwarded to the home node to fetch the data. Caching will avoid unnecessary communication if the up-to-date data exist in the buffer. Of course, caching may induce extra coherence messages.
Adsmith uses load/store-style memory accesses. As described in Section 2.3, a load operation should be performed before the first read reference in an interval, and a store operation should be performed after the last write reference. Other accesses for the shared object in the interval can be performed locally through the buffer. The above load and store operations are specified with the adsm refresh() and adsm flush() functions respectively. Detailed usage of these two functions will be described later in Section 6.1.
When a shared object is no longer referenced, it can be freed by the following function:
4
adsm free( void *ptr ); This function will discard the local buffer space. For better memory space utilization, the programmers are encouraged to free shared objects which will not be used in the near future. Freed objects can be reused by reallocating them again. Here is a simple example for the allocation and basic access functions.
float *pi=(float*)adsm.malloc("pi",sizeof(float)); ...other accesses... ... lock ... adsm.refresh(pi); // load(read) the content of pi
*pi=*pi+100; // add pi by 100 adsm.flush(pi); // store(write) the content of pi ... unlock ... adsm.free(pi); // free pi
6 Accesses of Shared Objects In Release Consistency model, memory operations are classified as ordinary accesses, synchronization accesses, and non-synchronization accesses. This section describes how these accesses are supported in Adsmith.
6.1 Ordinary Accesses The following two functions are for ordinary accesses:
void adsm refresh( void *ptr ); void adsm flush( void *ptr );
Since no hardware or operating system related facilities are used in Adsmith to support shared memory accesses, data should be manually refreshed (loaded) from the shared memory by the programmer before they are accessed. Similarly, they should be flushed (stored) back to the memory after they are modified. Under RC, adsm refresh() is for ordinary loads, and adsm flush() is for ordinary stores. The value refreshed is guaranteed to be as recent as that at the time of the last acquire. (See Section 6.2.) Under the load/store-style memory accesses, the refresh operation is invoked only once before the first load access after an acquire, and the flush operation is invoked only once after the last store access before a release. (See Section 2.3.)
6.2 Synchronization Accesses Ordinary accesses require that the programmer use enough synchronization accesses to ensure program correctness. Adsmith provides three classes of synchronization operations: semaphore, mutex and barrier. They are listed below as class prototype with their main public methods.
5
class AdsmSemaphore f public:
AdsmSemaphore( char *identifier, int init = 1 ); void wait(); void signal(); void set( int value ); int get(); g;
class AdsmMutex f public:
AdsmMutex( char *identifier ); void lock(); void unlock(); g;
class AdsmBarrier f public:
AdsmBarrier( char *identifier ); void barrier( int count ); g;
Since all three class instances are shared between processes, the identifier should also be specified. Semaphore is the most basic synchronization operation. It is implemented as a counting semaphore in Adsmith. Mutex is designed specifically for mutual exclusion, but there is a restriction on its usage: mutex lock and unlock operations must appear in pairs. Barrier supports barrier synchronization, and the number of processes participating in the synchronization must be specified.
Among the synchronization functions, semaphore wait, mutex lock and barrier are acquire accesses, and semaphore signal, mutex unlock and barrier are release accesses. An acquire is needed in order to gain the access right to a set of data, and a release is used to grant the access right. The RC model guarantees that ordinary accesses after an acquire will obtain the data available at the time of the acquire. Examples of synchronization accesses can be found in the example program given later.
In addition to the above three synchronization functions, process creation and termination are also one kind of synchronization accesses. The release accesses will be performed before child processes are spawned and before the process terminates. And the acquire accesses will be performed at the process initialization time.
6.3 Non-synchronization Accesses Synchronization accesses are competing accesses. However not all competing accesses are for synchronization purposes. This kind of accesses is called non-synchronization accesses. Adsmith has two basic non-synchronization functions:
6
void adsm refresh now( void *ptr ); void adsm flush now( void *ptr ); Although most ordinary accesses may be performed locally, non-synchronization accesses are always sent to home nodes directly. The tail " now" in the function names indicates that these two accesses will be performed without waiting for previous ordinary accesses. That is, a write through adsm flush now() will be seen immediately by all the following loads through adsm refresh now(), even when they are invoked by other processes. Adsmith currently support RCsc [5], competing accesses including both synchronization and non-synchronization accesses are sequentially consistent.
7 Advanced Functions Portable software DSM systems usually have the efficiency problem. One primary source of inefficiency is excessive messages. Two methods are generally used to solve this problem. The first is to overlap the communications, and the second is to reduce the number of messages. Adsmith implements both strategies and provides several advanced functions to improve the performance. These include prefetch, bulk transfer, atomic accesses, and collect accesses.
7.1 Prefetch Adsmith supports nonblocking load, i.e., data prefetching, through the following function.
void adsm prefresh( void *ptr ); The function corresponds to the load function adsm refresh(). The programmer can insert the prefetch function before the load function as far as possible. The sequence of shared data accesses in Adsmith can be depicted as follows.
Acquire ! Prefresh ! Refresh ! Local Accesses ! Flush ! Release where Refresh and Flush are ordinary accesses discussed above. Nonblocking load and store will be performed between prefresh-refresh and flush-release pairs respectively. The code segment below is a typical example to perform computations on a shared object within a critical section:
typeA *A=(typeA*)adsm.malloc("name of A",sizeof(typeA)); AdsmMutex mutex("mutex name"); // Mutex is a synchronization class mutex.lock();adsm.prefresh(A);
... prologue of computation ... adsm.refresh(A);... computations with local access on A ...
adsm.flush(A);... epilogue of computation ... mutex.unlock(); Note that adsm refresh() is still required for the load access to ensure that the requested data has arrived. Prefetch can be used most effectively in compilers, because compilers know the best places to insert these accesses.
7
7.2 Bulk Transfer Consecutive accesses can be aggregated to reduce the number of messages sent. Adsmith provides a bulk-transfer version of several functions for this purpose. In current implementation, these functions include allocation functions, prefetch accesses, refresh accesses, and flush accesses. The mechanism is to add the following functions in pairs to the beginning and the end of the corresponding group of functions or accesses.
void adsm malloc( AdsmBulkType *type ); void adsm prefresh( AdsmBulkType *type ); void adsm refresh( AdsmBulkType *type ); void adsm flush( AdsmBulkType *type );
AdsmBulkType is defined as follows:
enum AdsmBulkType f
AdsmBulkBegin, AdsmBulkEnd g
For example, to aggregate the refresh to a group of shared objects, one may use the following code:
// assume int *group[N] points to N integer shared objects adsm.refresh(AdsmBulkBegin); for (int i=0; i!N; i++) adsm.refresh(group[i]); adsm.refresh(AdsmBulkEnd);
By default, bulk transfer is always performed for adsm flush() on release time. Thus it is not necessary to specify bulk transfer for adsm flush() in most cases. It should be noted that other operations to the objects specified between AdsmBulkBegin and AdsmBulkEnd are not allowed, since the bulk transfer will not be actually performed until the AdsmBulkEnd time. To reduce the function calls when accessing a group of related shared objects with bulk transfer, Adsmith provides the form adsm \Lambda array() for the above adsm \Lambda () functions. Please refer to Appendix B for the interface.
7.3 Atomic Accesses Suppose we want to control the access to a shared object through a critical section. As described in Section 6.1, the conventional sequence for an ordinary access may look like this: acquire ! refresh ! local accesses ! flush ! release. Assume that the synchronization arbitrator for the critical section and the host of the shared object are located in different machines. Then accessing the shared object may require seven messages (two for acquire, two for refresh, two for flush, and one for release), if the shared object was invalidated by other processes before the acquire and was modified during local accesses. It is thus necessary to reduce the amount of messages for this kind of accesses. It should be noted
8
that such a situation will also happen on most DSMs, which support a similar consistency model.
The problem can be solved by allocating the synchronization arbitrator to the home node of the shared object and combining these two operations. During an acquire, the requested data can be piggy-backed on the lock grant message. After the computation, the modified data can also be sent back with the release message. In this way, the required number of messages will be reduced to only four (two for acquire and refresh, and two for flush and release).
Adsmith provides atomic accesses to support this kind of accesses. Two functions are supported:
void adsm atomic begin( void *ptr, int type = AdsmAtomicWrite ); void adsm atomic end( void *ptr );
The function adsm atomic begin() can be viewed as a combination of acquire and refresh, while the function adsm atomic end() as a combination of flush and release. Note that currently atomic accesses are categorized as non-synchronization accesses1. adsm atomic
begin() only performs pure lock and refresh operations on the target object. Similarly, adsm atomic end() only performs flush and pure unlock operations on the target object. No interactions will happen between atomic accesses and ordinary accesses. They are also sequentially consistent with any other competing accesses.
Let the program segment between adsm atomic begin() and adsm atomic end() be called atomic section. Two types of atomic operations can be specified in the type parameter in adsm atomic begin(): AdsmAtomicWrite and AdsmAtomicRead. The former is to indicate that both read and write accesses are included in the atomic section; and the latter is to indicate that only read accesses exist in the section. Adsmith implements a singlewriter/multiple-readers protocol. For a writer, adsm atomic begin() can be performed only when there is no reader nor writer in the atomic section. For a reader, adsm atomic
begin() can be performed only when there is no writer in the atomic section. For fairness purpose, Adsmith implements the writer first protocol. That is, when a writer is waiting to enter the atomic section, readers which come after the writer will be blocked until the writer has finished its atomic section. Of course, readers before the writer can proceed until they all exit their atomic sections.
Here is an example of atomic accesses modified from the example in Section 6.1.
typeA *A=(typeA*)adsm.malloc("name of A",sizeof(typeA)); adsm.atomic.begin(A);... computation with local access on A ...
adsm.atomic.end(A);
There is another form of atomic access which may be useful.
void adsm atomic( void *ptr, char *expr); 1Although atomic accesses will be categorized as synchronization accesses in the future, programs written in the current version will run correctly in the later versions
9
where expr is of the form
[ type ] expression The parameter type describes the type of the target object pointed by ptr, which currently include the C basic types only. The parameter expression is a C expression. The only variable in expression is the target object, which is represented by the symbol "@". The shared object will be calculated according to the expression specified by expression atomically. This kind of access is also called active access in Adsmith, since the expression is executed at its home. Here is an example of the function.
int *A=(int*)adsm.malloc("A",sizeof(int));adsm.atomic(A,"[int] @=@+100");
Although this function currently does not support complex data types, it can be very useful to reduce the number of messages. In fact, each adsm atomic() function call involves only two messages.
7.4 Collect Access Sometimes one may use a shared object as an accumulator. For example, the following code sequence calculates the total sum by cumulating partial sums.
AdsmBarrier bsum("bsum");int *sum=(int*)adsm.malloc("sum",sizeof(int));
int partial.sum=....; // calculate the partial sum char expr[20]; sprintf(expr,"[int] @+=%d",partial.sum); // prepare the expression adsm.atomic(sum,expr); // add partial sum bsum.barrier(nproc); // barrier on nproc processes adsm.refresh(sum); // retrieve the result
The code from adsm atomic() to adsm refresh() generates nproc\Lambda 6 messages. Adsmith provides a function called collect accesses to optimize for this kind of cases. The functions are listed as follows:
void adsm collect begin( void *ptr, int num ); void adsm collect end( void *ptr );
num is the number of processes participating in the access. Both adsm collect begin() and adsm collect end() must appear in pairs. The usage is similar to that of adsm atomic
begin() and adsm atomic end(). It effects as if the following functions were called in order: adsm atomic begin(), adsm atomic end(), barrier(), and adsm refresh(). But the number of messages generated is much reduced. The above example can now be re-written as follows.
int partial.sum=....; // calculate the partial sum adsm.collect.begin(sum,nproc); sum+=partial.sum; // add partial sum adsm.collect.end(sum); // total sum is returned
The number of messages involved now is reduced to nproc \Lambda 4.
10
8 Other Functions This section describes other functions provided in Adsmith.
8.1 Pointer Pointers to shared memory are supported in Adsmith, but with some restrictions. This is because the address of a shared object in one process may not be the same as that in the other process. Thus, the programmers are required to translate the local address of a shared object to a globally recognizable address before that address is passed to other processes through the shared memory. Functions for pointer manipulations are as follows:
int adsm gid( void *ptr ); void *adsm attach( int gid );
The function adsm gid() translates the local address of a shared object into its global address, which is represented by an integer. The function adsm attach(), on the other hand, translates a global address back to the local address for the requesting process. Its function is like adsm malloc() if the target object of the pointer is not present.
For example, if the programmer wants to pass the address of shared object T from process P1 to process P2 through the shared pointer object S, the following two sections of code can be used for the two processes.
// process P1 set the pointer AdsmBarrier Bpointer("barrier for this code"); sometype *T=(sometype*)adsm.malloc("target data",sizeof(sometype)); int *S=(int*)adsm.malloc("point to T",sizeof(int));
*S=adsm.gid(T); // get global address of T adsm.flush.now(S); // flush immediately Bpointer.barrier(2); // done
// process P2 get the pointer AdsmBarrier Bpointer("barrier for this code"); int *S=(int*)adsm.malloc("point to T",sizeof(int));
Bpointer.barrier(2); // wait P1 done adsm.refresh.now(S); // get the pointer value sometype *T=(sometype*)adsm.attach(S); // attach the pointer
8.2 Supporting Shared Memory and Message Passing Adsmith allows programmers to use message passing primitives in PVM together with its shared memory primitives. However, cares must be taken when using PVM message passing primitives. As mentioned in Section 4, processes must be spawned with adsm spawn() instead of pvm spawn(). Another thing to note is the range of message tags that are allowed when passing messages. In PVM message tags are integers and used in send and receive primitives to distinguish message channels. Adsmith currently occupies seven channels, which are from MAXINT down to MAXINT\Gamma 6, where MAXINT is the largest integer defined in the system.
11
8.3 Information Retrieval Section 2.1 discussed how to identify hosts and processes. Adsmith supports the following functions to retrieve the identifier and translate it to the corresponding PVM task id.
int adsm hostno( int procno = -1 ); int adsm procno( ); int adsm procno2tid( int procno ); int adsm tid2procno( int tid );
The function adsm hostno() returns the host id where the process specified by the process number procno resides. If procno is ignored, the host id of the calling process will be returned. The host id is in the range of zero to the number of hosts minus one. The function adsm procno() returns the process id of the calling process. It is in the range of zero to the number of processes minus one. The function adsm procno2tid() translates the process id to the corresponding PVM task id, while adsm tid2procno() translates the PVM task id back to the corresponding process id in Adsmith.
9 Summary Writing parallel programs is easier in Adsmith than in PVM. There are only two things that programmers of Adsmith have to note. First, synchronization operations must be inserted properly to ensure the correctness of ordinary accesses (see Section 6.2). This effort is needed for any shared-memory parallel programming. Second, the load (refresh) operation must be used before a shared object is referenced and the store (flush) operation is used after the object is modified (see Section 6.1).
Since programmer's knowledge about the programs are helpful for optimizing performance, Adsmith exploits this knowledge by providing users and/or compilers with several useful options. These include property settings, data distribution, multiple writer protocol (Section 5), prefetch, bulk transfer, atomic access, and collect access (Section 7). Our initial experimental results show that programs written in Adsmith have comparable performance with those written in PVM. Unfortunately, for performance reason, Adsmith currently only supports homogeneous environments to avoid data transformation between different architectures (e.g., XDR used in PVM).
Adsmith is an on-going project. Several new features are currently under investigation and will be implemented into the system in the near future. Performance of Adsmith is also being evaluated more extensively. New algorithms or techniques will be used to eliminate possible performance bottlenecks and to make the system more robust. With further refinements, we expect Adsmith to be a really useful environment for shared-memory programming on networks of workstations.
12
References
[1] Wen-Yew Liang, Chun-Ta King, Fepei Lai, "Adsmith: An Efficient Object-Based DSM
Environment on PVM," In the proceedings of the 1996 International Symposium on Parallel Architecture, Algorithms and Networks, Beijing, China, pp. 173-179, June 1996.
[2] B. Nitzberg and V. Lo, "Distributed Shared Memory: A Survey of Issues and Algo
rithms," IEEE Computer, Vol. 24, No. 8, pp. 52-60, Aug 1991.
[3] V.S. Sunderam, "PVM: A Framework for Parallel Distributed Computing," Concur
rency: Practice and Experience, Vol. 2, No. 4, Dec. 1990.
[4] A. Geist, et al., "PVM 3.0 User's Guide and Reference Manual," Oak Ridge National
Laboratory, 1993.
[5] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy,
"Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessor," In Proceedings of the 17th Annual International Symposium on Computer Architecture, pp. 15-26, May 1990.
[6] Pete Keleher, Alan L. Cox, Sandhya Dwarkadas and Willy Zwaenepoel, "TreadMarks:
Distributed Shared Memory on Standard Workstations and Operating Systems," In Proceedings of the 1994 Winter Usenix Conference, pp. 115-113, Jan 1994.
13
A An Example of Adsmith This section lists an example program of Adsmith. More examples can be found in the package.
////////////////////////////////////////////////////////////// // Sample program for Adsmith // Purpose: Calculate the summation of values in a vector // Paradigm: SPMD //// Algorithm:
// Shared Objects: // the vector and the summation// Procedure:
// 0. Declare initialize the shared objects. // 1. Spawn a process on each host. // 2. Every process calculates a partial sum. // 3. Cumulate the partial sum into the global sum //// Usage: sample [size]
//// Author: William W. Y. Liang
//////////////////////////////////////////////////////////////
#include !stdio.h?#include !stdlib.h? #include !time.h?#include !iostream.h? #include "adsm.h" // standard libray of Adsmith #include "adsmutil.h" // non-standard utilities of Adsmith#include "adsmtime.h" // a timing class
void slave(); // system parametersint seqno;
int nhost;AdsmBarrier Bsample("sample"); // for barrier synchronization
// initial valuesint zero=0; int deflen=100000; // default length // shared variableint *len; // length of the vector
int *vec; // source of the vectorint *sum; // result of the sum
main(int argc,char *argv[]) -
// get my sequence no. and host number seqno=get.seqno(); // get.seqno() and get.nhost() nhost=get.nhost(); // both are in adsmutil.h
// initialize content of the vector and spawn child processes if (seqno==0) - // the 1st process
14
// determine the size of vectorif (argc?1) deflen=atoi(argv[1]); cout!!nhost!!" host(s) detected"!!endl;cout!!"Length of vector = "!!deflen!!endl; // initialize the vectorint initvec[deflen]; for (int i=0; i!deflen; i++) initvec[i]=i; // allocate shared objects adsm.malloc(AdsmBulkBegin); // allocate in aggregate mode len=(int*)adsm.malloc("len",sizeof(int),&deflen);sum=(int*)adsm.malloc("sum",sizeof(int),&zero);
vec=(int*)adsm.malloc("vec",(*len)*sizeof(int),initvec);adsm.malloc(AdsmBulkEnd);
// spawn processes on each host adsm.spawn(execname(argv[0]),nhost); // extract program name
// timing Timing Tsample("sample"); Bsample.barrier(nhost+1); Tsample.start(); // timing start Bsample.barrier(nhost+1); Tsample.stop(); // timing end
// obtain the final resultadsm.refresh(sum); cout!!"Sum = "!!(*sum)!!endl;"" else - // the workers // allocate shared objects adsm.malloc(AdsmBulkBegin); // allocate in aggregate mode len=(int*)adsm.malloc("len",sizeof(int));sum=(int*)adsm.malloc("sum",sizeof(int));
adsm.malloc(AdsmBulkEnd); // read the parameters adsm.refresh(len);
// allocate shared vector and read the
valuevec=(int*)adsm.malloc("vec",(*len)*sizeof(int));
adsm.refresh(vec); slave();""
adsm.free(len);adsm.free(sum); adsm.free(vec);""
void slave() -// calculate the range to be computed by this process
int size=((*len)+nhost-1)/nhost;int begin=size*(seqno-1); int end=begin+size; if (end?(*len)) end=(*len);
// ready to compute Bsample.barrier(nhost+1);
15
// calculate partial sumint partial=0; for (int i=begin; i!end; i++) partial+=vec[i]; // add partial sum to the total sum // Using Mutex // AdsmMutex mutex("mutex");// mutex.lock(); // lock the mutex
// adsm.refresh(sum); // load the content of sum// *sum+=partial; // add partial sum to total sum // adsm.flush(sum); // store the content of sum// mutex.unlock(); // unlock the mutex
// Using Atomic Begin-End // atomic access is faster than the above mutex/refresh method// adsm.atomic.begin(sum); // begin of an atomic access on `sum'
// *sum+=partial; // add partial sum to total sum // adsm.atomic.end(sum); // end of the atomic access
// Using Atomic Expression char expr[20]; sprintf(expr,"[int] @+=%d",partial); adsm.atomic(sum,expr);
// wait for all processes done Bsample.barrier(nhost+1); ""
16
B The Manual Pages This section contains an alphabetical listing of all the Adsmith routines.
17 Name
AdsmAtomicType - atomic type (Advance) Synopsis
enum AdsmAtomicType f
AdsmAtomicWrite, AdsmAtomicRead g;
DESCRIPTION
Atomic types is used in adsm atomic begin() below. The atomic section is defined as the code region between adsm atomic begin() and adsm atomic end(). AdsmAtomicWrite means that there is write accesses in the atomic section. AdsmAtomicRead means that there is only read accesses in the atomic section. Atomic section follows the single writer = multiple reader protocol. That is, at any time there will be only one writer in the atomic section. Multiple readers are allowed to be in the atomic section at the same time.
See Also
adsm atomic, adsm atomic begin, adsm atomic end
18
Name
AdsmBarrier - the class of shared objects used as barrier (Primary Function) Synopsis
class AdsmBarrier f public:
AdsmBarrier( char *id ); ~AdsmBarrier(); int barrier( int count ); private:
int *tag; g;
DESCRIPTION
Barrier is for barrier synchronizations. As AdsmSemaphore, it also need an identifier id. barrier() is the only method. The number of processes participating in the barrier operation should be specified as the count parameter, and it must be the same for all the participating processes. barrier() will block the caller until all participating processes have arrived the barrier point. It implies both the acquire and release accesses in the RC model.
See Also
AdsmSemaphore, AdsmMutex
19
Name
AdsmBulk - Bulk transfer delimiter function for adsm malloc(), adsm refresh(), adsm flush(), and adsm prefresh() (Advance Functions)
Synopsis
void adsm malloc( AdsmBulkType type, int num = 0 ); void adsm refresh( AdsmBulkType type ); void adsm flush( AdsmBulkType type ); void adsm prefresh( AdsmBulkType type );
DESCRIPTION
Each of these functions must appear in pairs of type AdsmBulkBegin and AdsmBulkEnd. num of adsm malloc() is the approximate number of objects included in the bulk allocation. num is only used on AdsmBulkBegin and can be ignored. In fact, adsm flush() is implicitly performed on release time. Thus, it is rarely used.
See Also
AdsmBulkType, adsm malloc, adsm malloc array, adsm refresh, adsm refresh array, adsm flush, adsm flush array, adsm prefresh, adsm prefresh array
20
Name
AdsmBulkType - bulk transfer type (Advance Function) Synopsis
enum AdsmBulkType f
AdsmBulkBegin, AdsmBulkEnd g;
DESCRIPTION
This is used in the following bulk transfer delimiter functions. See Also
AdsmBulk
21
Name
AdsmHint - the properties of the shared objects (Primary) Synopsis
AdsmDataLocal, AdsmDataCache, AdsmDataUpdate, AdsmDataMWriter, AdsmDataMovable, AdsmDataDefault
DESCRIPTION
These are used as the hint parameters of the allocation functions below. AdsmDataLocal will allocate the object at the caller's host. AdsmDataCache lets the object utilize caching mechanisms; by default, caching is not used. If AdsmDataCache is set, AdsmDataUpdate will use write-update as its coherence protocol; otherwise write-invalidate is used. Specifying AdsmDataMWriter will enable multiple writer protocol to be applied on accessing to the object. AdsmDataMovable will make the home to be movable, But it is not implemented yet. These properties can be OR'ed.
See Also
adsm malloc, adsm malloc array
22
Name
AdsmMutex - the class of shared objects used as mutex (Primary Function) Synopsis
class AdsmMutex f public:
AdsmMutex( char *id ); ~AdsmMutex(); void lock(); void unlock(); private:
int *tag; g;
DESCRIPTION
Mutex is for critical sections. As AdsmSemaphore, it also need an identifier id. lock() will block the caller if the mutex has been locked by other process. unlock() will unlock the mutex. lock() and unlock() are acquire and release accesses respectively in the RC model.
See Also
AdsmSemaphore, AdsmBarrier
23
Name
AdsmSemaphore - the class of shared objects used as semaphores (Primary Function)
Synopsis
class AdsmSemaphore f public:
AdsmSemaphore( char *id, int val = 1 ); ~AdsmSemaphore(); int get(); void set( int val ); void wait(); void signal(); private:
int *value; g;
DESCRIPTION
A semaphore object is a integer counting semaphore, which is also a shared object. It requires an identifier id. The identifier can be the same as that appears in adsm malloc()'s, since they are distinguished from the normal shared object names internally. An initial value greater than zero can be set by the val parameter. The default value is one. Operations and accesses to semaphores must be performed through the methods provides by the class. get() and set() are used to retrieve and set the semaphore value respectively. wait() will block the caller if the semaphore value is less than one, otherwise it will decrease the semaphore value by one. signal() will wake up a process if there are waiting processes, otherwise it will increase the semaphore value by one. wait() and signal() are acquire and release accesses respectively in the RC model.
See Also
AdsmMutex, AdsmBarrier
24
Name
adsm atomic - atomic expression delimiter function (Advance Functions) Synopsis
void adsm atomic( void *data, char *expr ); DESCRIPTION
Perform the expression expr on the object atomically. expr is of the format: " [ !type? ] !expression? ", where !type? includes most of C's base types, namely char, short, int, long, uchar, ushort, uint, ulong, float, double; !expression? includes most of the C expression format. The only variable in !expression? is '@', which represents the shared objects. The shared object will be evaluated in type of !type? in the expression. Atomic accesses are assumed to be the Non-Synchronization accesses in the RC model.
See Also
adsm atomic begin, adsm atomic end
25
Name
adsm atomic begin, adsm atomic end - atomic section delimiter function (Advance Functions)
Synopsis
void adsm atomic begin( void *data, int rw = AdsmAtomicWrite ); void adsm atomic end( void *data );
DESCRIPTION
adsm atomic begin() indicates the beginning of an atomic section. It acts like a lock plus a refresh operation. adsm atomic end() indicates the end of an atomic section. It acts like a flush plus a unlock operation. But these two function do not perform the release and acquire operations in the RC model. Atomic accesses are assumed to be the Non-Synchronization accesses in the RC model.
See Also
adsm atomic, adsm collect begin, adsm collect end
26
Name
adsm attach - attach a global address retrieval into local process (Primary Function)
Synopsis
void* adsm attach( int gaddr ); DESCRIPTION
Attach a global address to the local process. It acts like the adsm malloc() function.
Returned Value:
The buffer address of the corresponding global address is returned. See Also
adsm gid, adsm malloc
27
Name
adsm collect begin, adsm collect end - collect access delimiter function (Advance Functions)
Synopsis
void adsm collect begin( void *data, int num ); void adsm collect end(void *data );
DESCRIPTION
Collect accesses can be used in cumulation operations. They are similar the atomic section functions. The different is that barrier operation of num processes will be performed on adsm collect end(). adsm collect begin() acts like a lock plus a refresh operations. adsm collect end() acts like a flush plus a unlock and then a barrier operations. Collect accesses are assumed to be the Non-Synchronization accesses in the RC model.
See Also
adsm atomic begin, adsm atomic end
28
Name
adsm enroll - force to enroll into Adsmith Synopsis
void adsm enroll(); DESCRIPTION
If none of the library functions are used in a program, adsm enroll() must be called to explicitly enroll into the Adsmith system such that it can interact with other programs in the application correctly.
See Also
AdsmBulk, adsm flush array, adsm refresh, adsm flush now
29
Name
adsm flush - ordinary store access, buffer flush function (Primary Function) Synopsis
void adsm flush( void *data ); DESCRIPTION
Flush the buffer of the shared object pointed by data to the shared memory. This will be internally delayed to the next release time.
See Also
AdsmBulk, adsm flush array, adsm refresh, adsm flush now
30
Name
adsm flush array - aggregated function of adsm flush()'s (Advance Function) Synopsis
void adsm flush array( void *data, int len ); DESCRIPTION
Aggregately flush an array of len shared objects pointed by data to the shared memory. Although data is of type void*, it will be interpreted as void** internally.
See Also
AdsmBulk, adsm flush
31
Name
adsm flush now - non-synchronization store access, buffer flush function (Primary Function)
Synopsis
void adsm flush now( void *data ); DESCRIPTION
Flush the buffer of the shared object pointed by data to the shared memory immediately.
See Also
adsm flush
32
Name
adsm free - free the shared object locally (Primary Functions) Synopsis
void adsm free( void *data ); DESCRIPTION
The shared object will be freed from the calling process. See Also
adsm malloc
33
Name
adsm gid - global address retrieval function (Primary Function) Synopsis
int adsm gid( void *data ); DESCRIPTION
adsm gid() translate local address data to its corresponding global address represented by a integer.
Return Values
The global address is returned. See Also
adsm attach
34
Name
adsm hostno - get the host number (Primary Function) Synopsis
int adsm hostno( int procno = -1 ); DESCRIPTION
This function is used to return the host number of a specified process represented by a process number procno. The host number can be used to manage the location where a shared object is to be allocated. It is of the range from zero to the number of hosts in the system minus one (i.e. 0 ~nhost-1).
Return Values
Without the procno parameter, the calling process's host number will be returned. Otherwise, the specified process's host number will be returned.
See Also
adsm procno
35
Name
adsm malloc - allocate a share object (Primary Function) Synopsis
void *adsm malloc( char *id, int size,
short hint = AdsmDataDefault );
void *adsm malloc( char *id, int size, void *initdata,
short hint = AdsmDataDefault );
void *adsm malloc( char *id, int size, void *initdata,
int home, short hint = AdsmDataDefault );
DESCRIPTION
To allocate a shared object, a string identifier id must be given. size is the object size in bytes. initdata points to a datum of size size, which will be set as the the initial value of the shared object if the object is first declared in the system. home represents a host on which the object will be allocated. home is between the range of zero and the number of hosts in the system minus one. The properties of the object can be set in hint.
Return Values
An address will be returned, which points to a local buffer of size size. It will also be used as a parameter of most of the access functions provided by Adsmith to interact with the shared memory.
See Also
AdsmHint, AdsmBulk, adsm malloc array, adsm free
36
Name
adsm malloc array - aggregated function of adsm flush()'s (Advance Function)
Synopsis
void adsm malloc array( char *id, int size, int num,
void *arrptr, short hint = AdsmDataDefault );
void adsm malloc array( char *id, int size, int num,
void *arrptr, int *distarr, short hint = AdsmDataDefault );
void adsm malloc array( char *id, int size, int begin,
int num, void *arrptr, void *init = 0, int *distarr = 0, short hint = AdsmDataDefault );
DESCRIPTION
These functions are short-cuts of bulk allocations. They will allocate an array of num relative shared objects, each of size size. The caller must provide an array arrptr which contains num pointer spaces to store the addresses of the allocated objects. Although arrptr is of type void*, it is internally interpreted as void**. id is not used as the identifier of each shared objects. The name of each objects is composed by appending id with "[i]", where i is the index starting from 0 or begin. The distribution of the shared objects can be determined by the array distarr which contains num host numbers (i.e. the homes) on which each corresponding objects will be allocated. init points to a datum of size size, used to initialize every shared object. All objects with have the same properties if hint is provided.
See Also
AdsmHint, AdsmBulk, adsm malloc
37
Name
adsm prefresh - prefetch access, buffer contents prefetch function (Advance Function)
Synopsis
void adsm prefresh( void *data ); DESCRIPTION
Prefetch the buffer contents of the shared object pointed by data from the shared memory.
See Also
AdsmBulk, adsm prefresh array, adsm refresh
38
Name
adsm prefresh array - aggregated function of adsm prefresh()'s (Advance Function)
Synopsis
void adsm prefresh array( void *data, int len ); DESCRIPTION
Aggregately prefetch an array of len shared objects pointed by data from the shared memory. Although data is of type void*, it will be interpreted as void** internally.
See Also
AdsmBulk, adsm prefresh
39
Name
adsm procno - get the process number (Primary Function) Synopsis
int adsm procno(); DESCRIPTION
Adsmith process number is different from PVM task id. It is of range from zero to the number of processes in the system minus one (i.e. 0 ~nproc-1). It is generally cooperated with the functions adsm hostno(), adsm procno2tid(), and adsm tid2procno() to accomplish the work such as data distribution, or communicating with other processes using PVM message passing primitives.
Return Values
The calling process's process number is returned. See Also
adsm hostno, adsm procno2tid, adsm tid2procno
40
Name
adsm procno2tid, adsm tid2procno - translate between Adsmith process number and PVM task id (Primary Functions)
Synopsis
int adsm procno2tid( int procno ); int adsm tid2procno( int tid );
DESCRIPTION
adsm procno2tid() translates the Adsmith process number procno to the corresponding PVM task id. adsm tid2procno() translates the PVM task id tid to the corresponding Adsmith process number.
Return Values
As described in the description. See Also
adsm procno, adsm procno2tid, adsm tid2procno
41
Name
adsm refresh - ordinary load access, buffer refresh function (Primary Function)
Synopsis
void adsm refresh( void *data ); DESCRIPTION
Refresh the buffer of the shared object pointed by data from the shared memory. The value refreshed is guaranteed to be as recent as that at the time of the last acquire.
See Also
AdsmBulk, adsm refresh array, adsm flush, adsm refresh now
42
Name
adsm refresh array - aggregated function of adsm refresh()'s ( Advance Function )
Synopsis
void adsm refresh array( void *data, int len ); DESCRIPTION
Aggregately refresh an array of len shared objects pointed by data from the shared memory. Although data is of type void*, it will be interpreted as void** internally.
See Also
AdsmBulk, adsm refresh
43
Name
adsm refresh now - non-synchronization load access, buffer refresh function (Primary Function)
Synopsis
void adsm refresh now( void *data ); DESCRIPTION
Refresh the buffer of the shared object pointed by data with the up-to-date data from the shared memory. Generally, adsm refresh now() is used to obtain the data stored by the most recent adsm flush now().
See Also
adsm refresh adsm flush now
44
Name
adsm shutdown - terminate the parallel application (Primary Function) Synopsis
void adsm shutdown(); DESCRIPTION
Kill all the processes in the application immediately.
45
Name
adsm spawn - process spawn function (Primary Function) Synopsis
int adsm spawn( char *prog, char **argv = 0, int flags = 0,
char *where = 0, int num = 1, int *tids = 0 );
int adsm spawn( char *prog, int num );
DESCRIPTION
The first function has the same spec as pvm spawn(). The second one is a short form. prog is the name of the program to be spawned, num is the number of processes to be spawned. Note that the returned tids are PVM task id. To find the corresponding Adsmith process numbers, please use the function adsm tid2procno(). adsm spawn() is also a kind of synchronization operations. It involves the release access in the RC model.
Return Values
The same as pvm spawn(), the number of processes that are successfully spawned is returned. tids in the first form will contains the PVM task id's of the spawned processes.
See Also
adsm wait, adsm tid2procno
46
Name
adsm version - get the Adsmith version (Primary Function) Synopsis
const char *adsm version(); DESCRIPTION
Get the Adsmith version number. Return Values
The version is returned in the format of a string.
47
Name
adsm wait - wait a process or a group of processes to finish (Primary Function) Synopsis
int adsm wait(); int adsm wait( int tid ); int adsm wait( int *tids, int num );
DESCRIPTION
The first one will wait all child processes to finish. The second one will wait a specified process to finish. tid is a PVM task id. The last one will wait an array of processes to finish. Note that the tids are PVM task id, to find the corresponding Adsmith process numbers, please use the function adsm tid2procno(). adsm wait() is also a kind of synchronization operations. It involves an acquire access in the RC model.
Return Values
The number of process finished is returned. See Also
adsm spawn, adsm tid2procno
48