Professor Norm Matloff
Dept. of Computer Science
University of California at Davis
Davis, CA 95616
Authors:
Norm Matloff, UC Davis
Drew Schmidt, Oak Ridge National
Labs
The library is in the early stages of development, still with some rough edges, but definitely usable.
I am slowly adding to the library. Requests are welcome. Contributions even more welcome.
Current routines:
I'm assuming you have R, of course, and the gcc compiler. If you have a CUDA-capable NVIDIA GPU, I assume you have the nvcc compiler. (All the above software tools are free.)
For the OpenMP case without nvcc, you'll need to download Thrust, from here.
In any case, be sure to use at least version 1.5 of Thrust.
Note: Sorry for the Linux-centric treatment here. If you are familiar with compiling and linking issues, it will be simple for you to convert to Windows or Mac steps. I'll add the former when I get a chance.
Also, sorry that none of this is automated yet.
Normally, C/C++ code to be linked to R is built by running R CMD SHLIB. This can still be done with Rth, but with some adjustments, described here.
Here are the steps, say for rthsort.cpp:
One needs to inform Thrust which backend we want, and one must tell gccto include OpenMP. This can be done by setting flags before running R CMD SHLIB:
setenv PKG_CPPFLAGS "-I/home/matloff/Thrust -fopenmp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -g" setenv PKG_LIBS "-lgomp" R CMD SHLIB rthsort.cpp
Adjust for your Thrust location and shell. For bash, the second enviroment setting command above for example would be
exprt PKG_LIBS="-lgomp"
This produces rthsort.so, ready to use from R.
In the first stage, one runs a straight nvcc compile (since I've used .cpp suffixes throughout, you need to tell nvcc these are really CUDA files, via -x cu):
nvcc -c rthsort.cpp -x cu -DGPU
(Add -Xcompiler "-fpic" if the compiler complains about PIC.)
This produces rthsort.o, which you input to R CMD SHLIB. For that purpose, you must first indicate where the CUDA files are:
setenv PKG_LIBS "-L/usr/local/cuda/lib -lcudart" R CMD SHLIB rthsort.o -o rthsortgpu.so
(Replace lib by lib64 if you are on a 64-bit machine.)
Again, this produces rthsort.so.
Each Rth routine has its own directory here, such as the one for the sort routine. There is a file named Usage in each one, showing usage and a small example, in some cases along with timing comparisons.
Copy the .so and .R files to the directory from which you will use them.
Note that in the OpenMP case, you can set the number of threads in an environment variable, e.g.
setenv OMP_NUM_THREADS 4
Absent that variable, OpenMP will likely run with the full thread capacity of your machine, which will probably be a reasonable value to use.
You can use the routines in the Rth library directly, without writing your own code. But if you have experience writing in C++, or at least C, you are encouraged to write your own Rth routines, and the code here has been written to serve as examples which will facilitate that.
At first, don't get distracted by all the "housekeeping" code. Most of this is just constructors to set up needed vectors. The heart of the code for each routine here is described in the HowItWorks file in the given directory.
For instance, here is code from the Kendall's Tau routine:
thrust::counting_iteratorseqa(0); thrust::counting_iterator seqb = seqa + n - 1; floublevec dx(x,x+n); floublevec dy(y,y+n); intvec tmp(n-1); thrust::transform(seqa,seqb,tmp.begin(),calcgti(dx,dy,n));
The first five lines set up vectors. The sixth line is the one that does the actual computation. You do have to learn how to set up the vectors, but this is easy.
The one C++/Thrust construct new to many programmers will be functors, which are simply callable versions of C structs. Reading an example or two will be enough for you to write your own functors.
See my open source book on parallel programming for details on writing Thrust code. The Thrust chapter can be read without referring to any other material in the book.
Older GPUs can handle only single-precision numbers, and even the newer ones may have speed issues with double-precision. R, on the other hand, does not accommodate single-precision code well. Accordingly, the code uses typedef to choose double or float, based on whether the GPU compiler flag is set.
You can of course view Thrust as a black box. But if you wish to learn its innards, I recommend writing little test programs, compiling them for a multicore backend, and then stepping through the code with a debugging tool. One advantage of Thrust's consisting only of "include" files is that the debugger will automatically show source lines.