CRAN Projects

Overview

The Comprehensive R Archive Network, CRAN, is a repository of thousands of R packages, contributed by volunteers, extremely helpful.

For example, when I needed an R package to do efficient nearest-neighbor computation, I plugged "CRAN nearest neighbor" into Google, and found several good packages, including the package I wound up using, FNN.

You will be writing your own CRAN-quality R packages in this course. I will specify the core requirements of your package, but it will be up to you to flesh out my ideas, coming up with your own ideas for extra features and ideas as to how to implement things in a way that is efficient, clear and above all, user-friendly.

The first project will be a package for the analysis of Markov chains. It will find stationary distributions, expected hitting times and so on, for both discrete- and continuous-time models. Note carefully that in this and the other CRAN projects, there is a major math content, which will be just as important as the coding.

Your package will be required to adhere to CRAN requirements, but you will not actually submit to CRAN. If a team does an especially good job, I may propose that the team and I submit the project to CRAN. (If you wish to submit to CRAN without me, that of course is fine too, but we should work out issues of credit.)

Introduction to R Packages: Usage

Our textbook, Appendix C, introduces usage of R libraries, in this case the popular graphics package, ggplot2. Section C.2 shows how to install the package, and how to load the library. Section C.4 then gives an extremely simple example.

Install the package on your machine, and try the simple example.

Introduction to R Packages: Development

How does one create an R package? As a running example, download the package freqparcoord, which I wrote with Yingkang Xie (who by the way was the TA for this course two years ago). But for now, just download the source code from that site.

Unpack the source, which will produce a directory freqparcoord/, and then enter that directory to see what's there. For instance, on a Unix-family system (Linux or Mac), do this:

% gunzip -c freqparcoord_1.0.0.tar.gz | tar xpf -
% cd freqparcoord
% ls
data/  DESCRIPTION  inst/  man/  MD5  NAMESPACE  R/

Here is an overview of some of those subdirectories and files:

Now install the package from source:

% cd ..
% R CMD INSTALL -l ~/R freqparcoord
...
...

This installs the package to the R subdirectory of your home directory. You can now run it. Start R, and type

> library(freqparcoord)
> ?freqparcoord 
...
...

Run the first of the examples printed out.

> data(mlb)
> freqparcoord(mlb,5,4:6,7,method="maxdens")

Now you can browse around the freqparcoord source files, and begin to see how to create a package. Some extra comments are in order for the man/ subdirectory:

The file names there have an .Rd suffix, and are processed by

% R CMD Rd2pdf x.Rd

for, say, a file x.Rd. If you have no errors, the output help page will be displayed on your screen. By the way, if you are familiar with LATEX, you will find ".Rd language" to be similar, but easy to learn whether or not you have that background.

To check for CRAN compatibility for a package in the directory xyz, you must build it, then run it through the CRAN check:

% R CMD build xyz  # produces xyz.tar.gz
% R CMD check --as-cran -l ~/R xyz.tar.gz

Be prepared! The script may at first have a lot of complaints. For instance, if you list a function in your NAMESPACE file but don't have documentation for it in man/, an error message will appear.

The best way to learn this material is to simply "compare input to output." For instance, for .Rd files, first run

> ?freqparcoord

and then compare the output to the file man/freqparcoord.Rd. For more information, start with the official CRAN documentation, and also plug "writing cran packages" into Google.