Quick Tutorial on RPy Package for R/Python Interface

Professor Norm Matloff
Dept. of Computer Science
University of California at Davis
Davis, CA 95616

Contents of this site:

What is RPY?

RPy is a simple, easy-to-use interface to R from Python. It enables one to enjoy the elegance of Python programming while having access to the rich graphical and statistical capabilities of R.

In its simplest form, shown here, one includes in one's Python code a statement

from rpy2.robjects import r

This launches an execution of R, with communication to the original Python program. The Python class instance r includes various functions for remote execution of R commands, including those involved with data produced by the Python program.

IMPORTANT NOTE: The material here concerns RPy2, not the original RPy.

Installing RPY:

Dowload RPy from the RPY home page. Unpack it, and in the top directory created by the package, open a shell/command window and run

python setup.py install

If you are on a multiuser system and do not have root privileges, you can specify a nondefault root directory. For example, on the UC Davis Computer Science Department's instructional machines, I typed

R RHOME  
setenv RHOME /usr/lib/R 
python setup.py install --root /home/matloff/Pub/rpy2

The first command ran R with a request to report where R was installed on the system, which turned out to be /usr/lib/R. The second command set the corresponding shell environment variable (C shell in my case). The third command specified a nondefault installation directory.

Introduction to using RPY:

First, make sure the RPy module is in your Python path. In the above context, I typed

setenv PYTHONPATH /home/matloff/Pub/rpy2/usr/lib/python2.5/site-packages/

Now, let's generate vectors x and y in R, do a scatter plot, fit a least-squares line, etc.:

>>> from rpy2.robjects import r
>>> r('x <- rnorm(100)')  # generate x at R
>>> r('y <- x + rnorm(100,sd=0.5)')  # generate y at R
>>> r('plot(x,y)')  # have R plot them
>>> r('lmout <- lm(y~x)')  # run the regression
>>> r('print(lmout)')  # print from R
>>> loclmout = r('lmout') # download lmout from R to Python
>>> print loclmout  # print locally
>>> print loclmout.r['coefficients']  # print one component

Now let's apply some R operations to some Python variables:

>>> u = range(10)  # set up another scatter plot, this one local
>>> e = 5*[0.25,-0.25]
>>> v = u[:]
>>> for i in range(10): v[i] += e[i]
>>> r.plot(u,v)
>>> r.assign('remoteu',u)  # ship local u to R
>>> r.assign('remotev',v)  # ship local v to R
>>> r('plot(remoteu,remotev)')  # plot there

There are many more functions. See the RPy documentation for details.