Rth: Parallel R through Thrust

Professor Norm Matloff
Dept. of Computer Science
University of California at Davis
Davis, CA 95616

Authors:
Norm Matloff, UC Davis
Drew Schmidt, Oak Ridge National Labs

Contents:

What are Thrust and Rth?

What is currently available in Rth:

We are slowly adding to the library. Requests are welcome. Contributions even more welcome.

The two main functions for calling C/C++ code from R are .C() and .Call(). Although the latter is more complex, it tends to produce faster code, and moreover there is an excellent tool for using it, Rcpp. On the other hand, it seems that there may be a subtle incompatiblity between Rcpp and Thrust in the case of GPU backends.

So, although the same Thrust code can be developed for different backends, the fact that we are calling from R means a slight distinction. Both call forms are provided, .C() for GPU backends and .Call() for multicore systems. Naming convention, currently in transition, will be that the .C() forms will contain dotc in the file names. In the case of sort, for instance, use rthsort() for multicore and rthsortdotc() for GPU.

Current routines:

Platform requirements:

I'm assuming you have R, of course, and the gcc compiler. If you have a CUDA-capable NVIDIA GPU, I assume you have the nvcc compiler and CUDA development package. (All the above software tools are free.)

For the OpenMP case without nvcc, you'll need to download Thrust, from here.

In any case, be sure to use at least version 1.5 of Thrust.

Building the Rth routines:

Note: Sorry for the Linux/Mac-centric treatment here. If you are familiar with compiling and linking issues, it will be simple for you to convert to Windows.

Also, sorry that none of this is automated yet.

Normally, C/C++ code to be linked to R is built by running R CMD SHLIB. This can still be done with Rth, but with some adjustments, described here.

Here are the steps, say for rthsort.cpp:

Usage:

Each Rth routine has its own directory here, such as the one for the sort routine. There is a file named Usage in each one, showing usage and a small example, in some cases along with timing comparisons.

Copy the .so and .R files to the directory from which you will use them.

Note that in the OpenMP case, you can set the number of threads in an environment variable, e.g.

export OMP_NUM_THREADS=4

Absent that variable, OpenMP will likely run with the full thread capacity of your machine, which will probably be a reasonable value to use.

Writing your own Rth code:

You can use the routines in the Rth library directly, without writing your own code. But if you have experience writing in C++, or at least C, you are encouraged to write your own Rth routines, and the code here has been written to serve as examples which will facilitate that.

At first, don't get distracted by all the "housekeeping" code. Most of this is just constructors to set up needed vectors. The heart of the code for each routine here is described in the HowItWorks file in the given directory.

For instance, here is code from the Kendall's Tau routine:

   thrust::counting_iterator seqa(0);
   thrust::counting_iterator seqb =  seqa + n - 1;
   floublevec dx(x,x+n);
   floublevec dy(y,y+n);
   intvec tmp(n-1);
   thrust::transform(seqa,seqb,tmp.begin(),calcgti(dx,dy,n));

The first five lines set up vectors. The sixth line is the one that does the actual computation. You do have to learn how to set up the vectors, but this is easy.

The one C++/Thrust construct new to many programmers will be functors, which are simply callable versions of C structs. Reading an example or two will be enough for you to write your own functors.

See my open source book on parallel programming for details on writing Thrust code. The Thrust chapter can be read without referring to any other material in the book.

Older GPUs can handle only single-precision numbers, and even the newer ones may have speed issues with double-precision. R, on the other hand, does not accommodate single-precision code well. Accordingly, the code uses typedef to choose double or float, based on whether the GPU compiler flag is set.

You can of course view Thrust as a black box. But if you wish to learn its innards, I recommend writing little test programs, compiling them for a multicore backend, and then stepping through the code with a debugging tool. One advantage of Thrust's consisting only of "include" files is that the debugger will automatically show source lines.