Index of /~matloff/R/Rdsm

      Name                    Last modified       Size  Description

[DIR] Parent Directory 22-Nov-2009 18:33 - [DIR] MakingPackages/ 13-Nov-2009 14:59 - [DIR] Rdsm/ 15-Nov-2009 16:09 - [DIR] Timing/ 17-Nov-2009 16:56 - [DIR] misc/ 08-Jul-2009 13:50 - [DIR] old/ 04-Nov-2009 23:03 - [   ] rdsm070709.tar.bz2 07-Jul-2009 23:16 8k [   ] rdsm070709.zip 07-Jul-2009 23:18 12k

Rdsm: an R Package for Distributed Shared Memory

Professor Norm Matloff
Dept. of Computer Science
University of California at Davis
Davis, CA 95616

Contents:

Overview:

Rdsm implements a distributed shared-memory system for R, presenting the programmer with a shared-memory view of data shared by multiple R processes running on the same or separate machines.

The performance should be similar to that of existing message-passing systems such as Rmpi, snow and ParallelR. In other words, except for clusters using special networks such as Myrinet or Infiniband, communications costs render the software suitable mainly for coarse-grained parallel applications. Possible future performance improvements are discussed below.

The advantage of Rdsm is its shared-memory view, which many in the parallel processing community prefer to message passing, as they consider the shared-memory view to make for clearer and cleaner code. Shared-memory, either with threads or with the highly popular OpenMP is the standard way of programming today's multicore machines.

It is easy to port sequential R code to Rdsm. To see this, compare the files KNN.r and KNNSeq.r in the examples/ directory.

Comments and suggestions highly welcome: Send to Norm Matloff, matloff@cs.ucdavis.edu.

Quick start:

Read and run the code in examples/MatMult1.r, which explains not only the core Rdsm features but also how to run the program. You may find it useful to run the test programs in tests/ as well.

How it works:

Rdsm operates by creating special classes for vectors and matrices (and in future versions, lists), and redefining "[" etc. on those classes. An access triggers a socket transaction to the server, where the shared data is actually stored. This is basically transparent to the application programmer. Rdsm application code looks identical to normal R code except for calls to newdsm() to create share variables, and except for the use of 0s for subscript " wild cards."

Of course, for correctness and efficiency, the programmer must be aware of the what is going on behind the scenes in Rdsm, but in composing his/code, the programmer can act as if there is a true shared-memory system in place.

Available client functions:

(See examples/MatMult1.r for usage examples.)

Future improvements: