\documentclass[11pt]{article}  

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.0in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.1in}

\usepackage{times}
\usepackage{fancyvrb}  
\usepackage{relsize}  
\usepackage{hyperref}

\usepackage{amsmath}

\usepackage{graphicx}

\begin{document}

\title{R for Programmers} 

\author{Norman Matloff\\
 University of California, Davis\\
        \copyright{}2007-8, N. Matloff }

\date{December 4, 2008}   

\maketitle

{\bf \Large IMPORTANT NOTICE:}  This document is no longer being
maintained.  My book on R programming, {\it The Art of R Programming},
is due out in August 2011.  You are welcome to use an early draft at
\url{http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf}; it was
about 50\% complete and contains bugs, but should be useful.

{\bf Licensing:}

This work, dated December 4, 2008, is licensed under a Creative Commons
Attribution-No Derivative Works 3.0 United States License.  Subsequent
works by the author that make use of part or all of this material will
not be covered by the license.  Copyright is retained by N. Matloff in
all non-U.S. jurisdictions. 

\newpage

\tableofcontents{}
\newpage

\section*{Prerequisites}
\addcontentsline{toc}{section}{Prerequisites}

The only real prerequisite is that you have some programming experience;
you need not be an expert programmer, though experts should find the
material suitable for their level too. 

Occasionally there will be some remarks aimed at professional
programmers, say about object-oriented programming or Python, but these
remarks will not make the treatment inaccessible to those having only a
moderate background in programming.

\newpage

\section{What Is R?}

R is a scripting language for statistical data manipulation and
analysis.  It was inspired by, and is mostly compatible with, the
statistical language S developed by AT\&T.  The name S, obviously
standing for statistics, was an allusion to another programming language
developed at AT\&T with a one-letter name, C. S later was sold to a
small firm, which added a GUI interface and named the result S-Plus.

R has become more popular than S/S-Plus, both because it's free and
because more people are contributing to it.  R is sometimes called ``GNU
S.''

\section{Why Use R for Your Statistical Work?}
\label{why}

Why use anything else? As the Cantonese say, {\it yauh peng, yauh
leng}---``both inexpensive and beautiful."

Its virtues:

\begin{itemize}

\item a public-domain implementation of the widely-regarded S
statistical language; R/S is the de facto standard among professional
statisticians

\item comparable, and often superior, in power to commercial products in
most senses

\item available for Windows, Macs, Linux

\item in  addition  to  enabling  statistical operations, it's a general
programming language, so that you can automate your analyses and create
new functions

\item object-oriented and functional programming structure

\item your data sets are saved between sessions, so you don't have to
reload each time

\item open-software  nature  means  it's  easy to get help from the
user community, and lots of new functions get contributed by users, many
of which are prominent statisticians

\end{itemize}

I should warn you that one submits commands to R via text, rather than
mouse clicks in a Graphical User Interface (GUI). If you can't live
without GUIs, you should consider using one of the free GUIs that have
been developed for R, e.g. R Commander or JGR.  (See Section \ref{gui}
below.) Note that R definitely does have graphics---tons of it. But the
graphics are for the output, e.g. plots, not for the input.

Though the terms {\it object-oriented} and {\it functional programming}
may pique the interests of computer scientists, they are actually quite
relevant to anyone who uses R.

The  term {\it object-oriented} can be explained by example, say
statistical regression. When you perform a regression analysis with
other statistical packages, say SAS or SPSS, you get a mountain of
output. By contrast, if you call the {\bf lm()} regression function in
R, the function returns an object containing all the results---estimated
coefficients, their standard errors, residuals, etc. You then pick and
choose which parts of that object to extract, as you wish.

R is {\it polymorphic}, which means that the same function can be
applied to different types of objects, with results tailored to the
different object types. Such a function is called a {\it generic
function}.\footnote{In C++, this is called a {\it virtual function}.}
Consider for instance the {\bf plot()} function. If you apply it to a
simple list of numbers, you get a simple plot of them, but if you apply
it to the output of a regression analysis, you get a set of plots of
various aspects of the regression output. This is nice, since it means
that you, as a user, have fewer commands to remember! For instance, you
know that you can use the {\bf plot()} function on just about any object
produced by R.

The fact that R is a programming language rather than a collection of
discrete commands means that you can combine several commands, each one
using the output of the last, with the resulting combination being quite
powerful and extremely flexible. (Linux users will recognize the
similarity to shell pipe commands.)

For example, consider this (compound) command

\fvset{fontsize=\relsize{-2}}
\begin{Verbatim}
nrow(subset(x03,z==1))
\end{Verbatim}

First the {\bf subset()} function would take the data frame {\bf x03},
and cull out all those records for which the variable {\bf z} has the
value 1. The resulting new frame would be fed into {\bf nrow()}, the
function that counts the number of rows in  a frame. The net effect
would be to report a count of {\bf z} = 1 in the original frame.

A common theme in R programming is the avoidance of writing explicit
loops.  Instead, one exploits R's functional programming and other
features, which do the loops internally.  They are much more efficient,
which can make a huge timing difference when running R on large data
sets.

\section{How to Run R}

R has two modes, {\it interactive} and {\it batch}.  The former is the
typical one used.  

\subsection{Interactive Mode}

You start R by typing ``R'' on the command line in Linux or on a Mac, or
in a Windows Run window.  You'll get a greeting, and then the R prompt,
$>$.

You can then execute R commands, as you'll see in the quick sample
session discussed in Section \ref{first}.  Or, you may have your own R
code which you want to execute, say in a file {\bf z.r}.  You could
issue the command

\begin{Verbatim}
> source("z.r") 
\end{Verbatim}

which would execute the contents of that file.  Note by the way that the
contents of that file may well just be a function you've written.  In
that case, ``executing'' the file would mean simply that the R
interpreter reads in the function and stores the function's definition
in memory.  You could then execute the function itself by calling it
from the R command line, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> f(12)
\end{Verbatim}

\subsection{Running R in Batch Mode}

Sometimes it's preferable to automate the process of running R. For example,
we may wish to run an R script that generates a graph output file, and not
have  to  bother with manually running R. Here's how it could be done.
Consider the file {\bf z.r}, which produces a histogram and saves it to
a PDF file:

\begin{Verbatim}[fontsize=\relsize{-2}]
pdf("xh.pdf")  # set graphical output file
hist(rnorm(100))  # generate 100 N(0,1) variates and plot their histogram
dev.off()  # close the file
\end{Verbatim}

Don't worry about the details; the information in the comments (marked
with \#) suffices here.

We could run it automatically by simply typing

\begin{Verbatim}[fontsize=\relsize{-2}]
R CMD BATCH --vanilla < z.r
\end{Verbatim}

The {\bf --vanilla} option tells R not to load any startup file
information, and not to save any.

\section{A First R Example Session (5 Minutes)}
\label{first}

We start R from our shell command line, and get the greeting message and
the $>$ prompt:

\begin{Verbatim}[fontsize=\relsize{-2}]

R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.1.1  (2005-06-20), ISBN 3-900051-07-0
...
Type `q()' to quit R.

>
\end{Verbatim}

Now let's make a simple data set, a {\it vector} in R parlance,
consisting of the numbers 1, 2 and 4, and name it {\bf x}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,4)
\end{Verbatim}

The standard assignment operator in R is $<$-.  However, there are also
-$>$, = and even the {\bf assign()} function.

The ``c" stands for ``concatenate."   Here we are concatenating the
numbers 1, 2 and 4.  Or more precisely, we are concatenating three
one-element vectors consisting of those numbers.  This is because any 
object is considered a one-element vector.  

Thus we can also do, for instance,

\begin{Verbatim}[fontsize=\relsize{-2}]
> q <- c(x,x,8)
\end{Verbatim}

which would set {\bf q} to (1,2,4,1,2,4,8).

Since ``seeing is believing," go ahead and confirm that the data is
really in {\bf x};  to  print  the vector to the screen, simply type its
name.  If you type any variable name, or more generally an expression,
while in interactive mode, R will print out the value of that variable
or expression.  (Python programmers will find this feature familiar.)
For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
> x
[1] 1 2 4
\end{Verbatim}

Yep, sure enough, {\bf x} consists of the numbers 1, 2 and 4.

The ``[1]'' here means in this row of output, the first item is item 1
of that output.  If there were say, two rows of output with six items
per row, the second row would be labeled [7].  Our output in this case
consists of only one row, but this notation helps users read voluminous
output consisting of many rows. 

Again, in interactive mode, one can always print an object in R by simply
typing its name, so let's print out the third element of {\bf x}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x[3]
[1] 4
\end{Verbatim}

We might as well find the mean and standard deviation:

\begin{Verbatim}[fontsize=\relsize{-2}]
> mean(x)
[1] 2.333333
> sd(x)
[1] 1.527525
\end{Verbatim}

If we had wanted to save the mean in a variable instead of just printing it
to the screen, we could do, say,

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- mean(x)
\end{Verbatim}

Again, since you are learning, let's confirm that {\bf y} really does
contain the mean of {\bf x}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y
[1] 2.333333
\end{Verbatim}

As noted earlier, we use \# to write comments.  

\begin{Verbatim}[fontsize=\relsize{-2}]
> y  # print out y
[1] 2.333333
\end{Verbatim}

These of course are especially useful when writing programs, but they
are useful for interactive use too, since R does record your commands
(see Section \ref{session}). The comments  then help you remember what
you were doing when you later read that record.

As the last example in this quick introduction to R, let's work with one
of R's internal datasets, which it uses for demos. You can get a list of
these datasets by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> ?datasets
\end{Verbatim}

One of the datasets is {\bf Nile}, containing data on the flow of the
Nile River. Let's again find the mean and standard deviation,

\begin{Verbatim}[fontsize=\relsize{-2}]
> mean(Nile)
[1] 919.35
> sd(Nile)
[1] 169.2275
\end{Verbatim}

and also plot a histogram of the data:

\begin{Verbatim}[fontsize=\relsize{-2}]
> hist(Nile)
\end{Verbatim}

A window pops up with the histogram in it, as seen in Figure
\ref{nilehist}. This one is bare-bones simple, but R has all kinds of
bells and whistles you can use optionally. For instance, you can change
the number of bins by specifying the breaks variable; hist(z,breaks=12)
would draw a histogram of the data z with 12 bins. You can make nicer
labels, etc. When you become more familiar with R, you'll be able to
construct complex color graphics of striking beauty.

\begin{figure}[tb]
\centerline{
\includegraphics[width=4.0in]{NileHist.pdf}
}
\caption{Nile data}
\label{nilehist}
\end{figure}


Well,  that's the end of this first 5-minute introduction. We leave by
calling the quit function (or optionally by hitting ctrl-d in Linux):

\begin{Verbatim}[fontsize=\relsize{-2}]
> q()
Save workspace image? [y/n/c]: n
\end{Verbatim}

That last question asked whether we want to save our variables, etc., so
that we can resume work later on. If we answer y, then the next time we
run R, all those objects will automatically be loaded. This is a very
important feature, especially when working with large or numerous
datasets; see more in Section \ref{session}.

\section{Functions: a Short Programming Example}
\label{writefun}

In the following example, we define a function {\bf oddcount()} while in
R's interactive mode, and then call the function on a couple of test
cases. The function is supposed to count the number of odd numbers in
its argument vector.

\begin{Verbatim}[fontsize=\relsize{-2}]
# comment:  counts the number of odd integers in x
> oddcount <- function(x)  {
+ k <- 0
+ for (n in x)  {
+    if (n %% 2 == 1) k <- k+1
+ }
+ return(k)
+ }
> oddcount(c(1,3,5))
[1] 3
> oddcount(c(1,2,3,7,9))
[1] 4
\end{Verbatim}

Here is what happened: We first told R that we would define a function
{\bf oddcount()} of one argument x. The left brace demarcates the start of the
body of the function. We wrote one R statement per line. Since we were still
in the body of the function, R reminded us of that by using + as its
prompt\footnote{Actually, this is a line continuation character.}
instead of the usual $>$. After we finally entered a right brace to end the
function body, R resumed the $>$ prompt.

\section{Scalars, Vectors, Arrays and Matrices}

Remember, objects are actually considered one-element vectors.  So,
there is really no such thing as a scalar.

Vector elements must all have the same {\it mode}, which can be {\bf
integer}, {\bf numeric} (floating-point number), {\bf character}
(string), {\bf logical} (boolean), {\bf complex}, {\bf object}, etc.

Vectors indices begin at 1.  Note that vectors are stored like arrays in
C, i.e. contiguously, and thus one cannot insert or delete elements,
{\it a la} Python.  If you wish to do this, use a list instead.

A variable might not have a value, a situation designated as {\bf NA}.
This is like {\bf None} in Python and {\bf undefined} in Perl, though
its origin is different.  In statistical datasets, one often encounters
missing data, i.e. observations for which the values are missing.  In
many of R's statistical functions, we can instruct the function to skip
over any missing values.

Arrays and matrices are actually vectors too, as you'll see; they merely
have extra attributes, e.g. in the matrix case the numbers of rows and
columns.  Keep in mind that since arrays and matrices are vectors, that
means that everything we say about vectors applies to them too.

One can obtain the length of a vector by using the function of the same
name, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,4)
> length(x)
[1] 3
\end{Verbatim}

\subsection{``Declarations''}

You must warn R ahead of time that you intend a variable to be one of
the vector/array types. For instance, say we wish {\bf y} to be a
two-component vector with values 5 and 12. If you try

\begin{Verbatim}[fontsize=\relsize{-2}]
> y[1] <- 5
> y[2] <- 12
\end{Verbatim}

the first command (and the second) will be rejected, but

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- vector(length=2)
> y[1] <- 5
> y[2] <- 12
\end{Verbatim}

works, as does

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- c(5,12)
\end{Verbatim}

The latter is OK because the right-hand side is a vector type, so we are
binding {\bf y} to an already-existent vector.

\subsection{Generating Useful Vectors with ``:'', seq() and rep()}

Note the : operator:

\begin{Verbatim}[fontsize=\relsize{-2}]
> 5:8
[1] 5 6 7 8
> 5:1
[1] 5 4 3 2 1
\end{Verbatim}

Beware of the operator precedence:

\begin{Verbatim}[fontsize=\relsize{-2}]
> i <- 2
> 1:i-1
[1] 0 1
> 1:(i-1)
[1] 1
\end{Verbatim}

The {\bf seq()} ("sequence") generates an arithmetic sequence, e.g.:

\begin{Verbatim}[fontsize=\relsize{-2}]
> seq(5,8)
[1] 5 6 7 8
> seq(12,30,3)
[1] 12 15 18 21 24 27 30
> seq(1.1,2,length=10)
 [1] 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
\end{Verbatim}

Though it may seem innocuous, the {\bf seq()} function provides
foundation for many R operations. See examples in Sections
\ref{simulation} and \ref{explicit}.

The {\bf rep()} ("repeat") function allows us to conveniently put the same
constant into long vectors. The call form is {\bf rep(z,k)}, which creates a
vector of {\bf k*length(z)} elements, each equal to {\bf z}. For
example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- rep(8,4)
> x
[1] 8 8 8 8
\end{Verbatim}

\begin{Verbatim}[fontsize=\relsize{-2}]
> rep(1:3,2)
[1] 1 2 3 1 2 3
\end{Verbatim}

\subsection{Vector Arithmetic and Logical Operations}
\label{vectorops}

You can add vectors, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,4)
> x + c(5,0,-1)
[1] 6 2 3
\end{Verbatim}

You may surprised at what happens when we multiply them:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x * c(5,0,-1)
[1]  5  0 -4
\end{Verbatim}

As you can see, the multiplication was elementwise.  This is due to the
functional programming nature of R. 

The {\bf any()} and {\bf all()} functions are handy:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- 1:10
> if (any(x > 8)) print("yes")
[1] "yes"
> if (any(x > 88)) print("yes")
> if (all(x > 88)) print("yes")
> if (all(x > 0)) print("yes")
[1] "yes"
\end{Verbatim}

\subsection{Recycling}
\label{recycling}

When applying an operation to two vectors which requires them to be the same
length, the shorter one will be {\it recycled}, i.e. repeated, until it is long
enough to match the longer one, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> c(1,2,4) + c(6,0,9,20,22)
[1]  7  2 13 21 24
Warning message:
longer object length
     is not a multiple of shorter object length in: c(1, 2, 4) + c(6,
     0, 9, 20, 22)
\end{Verbatim}

Here's a more subtle example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x  
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> x+c(1,2)
     [,1] [,2]
[1,]    2    6
[2,]    4    6
[3,]    4    8
\end{Verbatim}

What happened here is that {\bf x}, as a 3x2 matrix, is also a
six-element vector, which in R is stored column-by-column.  We added a
two-element vector to it, so our addend had to be repeated twice to make
six elements.  So, we were adding c(1,2,1,2,1,2) to {\bf x}.  

\subsection{Vector Indexing} 
\label{slicing}

You can also do {\it indexing} of arrays, picking out elements with specific
indices, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- c(1.2,3.9,0.4,0.12)
> y[c(1,3)]
[1] 1.2 0.4
> y[2:3]
[1] 3.9 0.4
\end{Verbatim}

Note carefully that duplicates are definitely allowed, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(4,2,17,5)
> y <- x[c(1,1,3)]
> y
[1]  4  4 17
\end{Verbatim}

Negative subscripts mean that we want to exclude the given elements in our
output:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- c(5,12,13)
> z[-1]  # exclude element 1
[1] 12 13
> z[-1:-2]
[1] 13
\end{Verbatim}

In such contexts, it is often useful to use the {\bf length()} function: 

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- c(5,12,13)
> z[1:length(z)-1]
[1]  5 12
\end{Verbatim}

Note that this is more general than using {\bf z[1:2]}.  In a program
with general-length vectors, we could use this pattern to exclude the
last element of a vector. 

Here  is  a more involved example of this principle. Suppose we have a
sequence of numbers for which we want to find successive differences, i.e.
the difference between each number and its predecessor. Here's how we could
do it:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(12,15,8,11,24)
> y <- x[-1] - x[-length(x)]
> y
[1]  3 -7  3 13
\end{Verbatim}

Here we want to find the numbers 15-12 = 3, 8-15 = -7, etc. The
expression {\bf x[-1]}  gave  us  the  vector  (15,8,11,24)  and  {\bf
x[-length(x)]} gave us (12,15,8,11). Subtracting these two vectors then
gave us the differences we wanted.

{\bf Make careful note of the above example.  This is the ``R way of
doing things.''  By taking advantage of R's vector operations, we came
up with a solution which avoids loops.  This is clean, compact and
likely much faster when our vectors are long.  We often use R's
functional programming features to these ends as well.}

\subsection{Vector Element Names}
\label{ven}

The elements of a vector can optionally be given names.  For instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,4)
> names(x)
NULL
> names(x) <- c("a","b","ab")
> names(x)
[1] "a"  "b"  "ab"
> x
 a  b ab
 1  2  4
\end{Verbatim}

We can remove the names from a vector by assigning NULL:

\begin{Verbatim}[fontsize=\relsize{-2}]
> names(x) <- NULL
> x
[1] 1 2 4
\end{Verbatim}

We can even reference elements of the vector by name, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,4)
> names(x) <- c("a","b","ab")
> x["b"]
b
2
\end{Verbatim}

\subsection{Matrices}

A matrix is a vector with two additional attributes, the number of rows
and number of columns.

\subsubsection{General Operations}

Multidimensional vectors in R are called {\it arrays}. A two-dimensional
array is also called a {\it matrix}, and is eligible for the usual matrix
mathematical operations. 

Matrix row and column subscripts begin with 1, so for instance the
upper-left corner of the matrix {\bf a} is denoted {\bf a[1,1]}. The
internal linear storage of a matrix is in {\it column-major order},
meaning that first all of column 1 is stored, then all of column 2, etc.

One of the ways to create a matrix is via the {\bf matrix()} function, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- matrix(c(1,2,3,4),nrow=2,ncol=2)
> y
  [,1] [,2]
[1,] 1    3
[2,] 2    4
\end{Verbatim}

Here we concatenated what we intended as the first column, the numbers 1 and
2, with what we intended as the second column, 3 and 4. That was our data in
linear form, and then we specified the number of rows and columns. The fact
that R uses column-major order then determined where these four numbers were
put.

Though internal storage of a matrix is in column-major order, we can use
the {\bf byrow} argument in {\bf matrix()} to TRUE in order to specify
that the data we are using to fill a matrix be interpreted as being in
row-major order.  For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> m <- matrix(c(1,2,3,4,5,6),nrow=3)
> m
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> m <- matrix(c(1,2,3,4,5,6),nrow=2,byrow=T)
> m
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
\end{Verbatim}

(`T' is an abbreviation for ``TRUE''.)

Since we specified the matrix entries in the above example, we would not
have needed to specify {\bf ncol}; just {\bf nrow} would be enough. For
instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- matrix(c(1,2,3,4),nrow=2)
> y
  [,1] [,2]
[1,] 1    3
[2,] 2    4
\end{Verbatim}

Note that when we then printed out {\bf y}, R showed us its notation for
rows and columns. For instance, [,2] means column 2, as can be seen in
this check:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y[,2]
[1] 3 4
\end{Verbatim}

Another  way we could have built {\bf y} would have been to specify elements
individually:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- matrix(nrow=2,ncol=2)
> y[1,1] = 1
> y[2,1] = 2
> y[1,2] = 3
> y[2,2] = 4
> y
  [,1] [,2]
[1,] 1    3
[2,] 2    4
\end{Verbatim}

We can perform various operations on matrices, e.g. matrix multiplication,
matrix scalar multiplication and matrix addition:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y %*% y  # ordinary matrix multiplication
  [,1] [,2]
[1,] 7   15
[2,]10   22
> 3*y
  [,1] [,2]
[1,] 3    9
[2,] 6   12
> y+y
  [,1] [,2]
[1,] 2    6
[2,] 4    8
\end{Verbatim}

For linear algebra operations on matrices, see Section \ref{linalg}.

Again, keep in mind---and when possible, exploit---the notion of
recycling (Section \ref{recycling}.  For instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- 1:2
> y <- c(1,3,4,10)
> x*y
[1]  1  6  4 20
\end{Verbatim}

Since {\bf x} was shorter than {\bf y}, it was recycled to the
four-element vector {\bf c(1,2,1,2)}, then multiplied elementwise with
{\bf y}.

\subsubsection{Matrix Row and Column Names}

For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- matrix(c(1,2,3,4),nrow=2)
> z
     [,1] [,2]
[1,]    1    3
[2,]    2    4
> colnames(z)  <- c("a","b")
> z
     a b
[1,] 1 3
[2,] 2 4
> colnames(z)
[1] "a" "b"
\end{Verbatim}

The function {\bf rownames()} works similarly.

\subsubsection{Matrix Indexing}

The  same  operations we discussed in Section \ref{slicing} apply to
matrices. For instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z
  [,1] [,2] [,3]
[1,] 1    1    1
[2,] 2    1    0
[3,] 3    0    1
[4,] 4    0    0
> z[,c(2,3)]
  [,1] [,2]
[1,] 1    1
[2,] 1    0
[3,] 0    1
[4,] 0    0
\end{Verbatim}

Here's another example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- matrix(c(11,21,31,12,22,32),nrow=3,ncol=2)
> y
  [,1] [,2]
[1,]11   12
[2,]21   22
[3,]31   32
> y[2:3,]
  [,1] [,2]
[1,]21   22
[2,]31   32
> y[2:3,2]
[1] 22 32
\end{Verbatim}

You can copy a smaller matrix to a slice of a larger one:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> y[2:3,] <- matrix(c(1,1,8,12),nrow=2)
> y
     [,1] [,2]
[1,]    1    4
[2,]    1    8
[3,]    1   12
\end{Verbatim}

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- matrix(nrow=3,ncol=3)
> x[2:3,2:3] <- cbind(4:5,2:3)
> x  
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA    4    2
[3,]   NA    5    3
\end{Verbatim}

\subsection{Sensing the Number of Rows and Columns of a Matrix}

The numbers of rows and columns in a matrix can be obtained through the
{\bf nrow()} and {\bf ncol()} functions, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x
  [,1] [,2]
[1,] 1    4
[2,] 2    5
[3,] 3    6
> nrow(x)
[1] 3
\end{Verbatim}

These functions are useful when you are writing a general-purpose
library function whose argument is a matrix.  By being able to sense the
number of rows and columns in your code, you alleviate the caller of the
burden of supplying that information as two additional arguments.

\subsection{Treating a Vector As a One-Column Matrix}

In doing matrix work, you may be disconcerted to find that if you have
code that deals with matrices of various sizes, a matrix degenerates to
a vector if it has only one column. You can fix that by then applying
{\bf as.matrix()}.  For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> u
[1] 1 2 3
> v <- as.matrix(u)
> attributes(u)
NULL
> attributes(v)
$dim
[1] 3 1
\end{Verbatim}

So, {\bf as.matrix()} converted the 3-element vector {\bf u} to a 3x1
matrix {\bf v}.

\subsection{Adding/Deleting Elements of Vectors and Matrices}
\label{rcbind}

Technically, vectors and matrices are of fixed length and dimensions.
However, they can be reassigned, etc.  Consider:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(12,5,13,16,8)
> x <- c(x,20)  # append 20
> x
[1] 12  5 13 16  8 20
> x <- c(x[1:3],20,x[4:6])  # insert 20
> x
[1] 12  5 13 20 16  8 20  # delete elements 2 through 4
> x <- x[-2:-4]
> x
[1] 12 16  8 20
\end{Verbatim}

The {\bf rbind()} and {\bf cbind()} functions enable one to add rows or
columns to a matrix.

For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> one
[1] 1 1 1 1
> z
  [,1] [,2] [,3]
[1,] 1    1    1
[2,] 2    1    0
[3,] 3    0    1
[4,] 4    0    0
> cbind(one,z)
[1,]1 1 1 1
[2,]1 2 1 0
[3,]1 3 0 1
[4,]1 4 0 0
\end{Verbatim}

You can also use these functions as a quick way to create small
matrices:

\begin{Verbatim}[fontsize=\relsize{-2}]
> q <- cbind(c(1,2),c(3,4))
> q
     [,1] [,2]
[1,]    1    3
[2,]    2    4
\end{Verbatim}

We can delete rows or columns in the same manner as shown for vectors
above, e.g.:

\begin{Verbatim}[fontsize=\relsize{-2}]
> m <- matrix(1:6,nrow=3)
> m
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> m <- m[c(1,3),]
> m
     [,1] [,2]
[1,]    1    4
[2,]    3    6
\end{Verbatim}

\subsection{Matrix Row and Column Mean Functions}

The function {\bf mean()} applies only to vectors, not matrices. If one
does call this function with a matrix argument, the mean of all of its
elements is computed, not multiple means row-by-row or column-by-column,
since a matrix is a vector.

The functions {\bf rowMeans()} and {\bf colMeans()} return vectors
containing the means of the rows and columns.  There are also
corresponding functions {\bf rowSums()} and {\bf colSums()}.

\subsection{Linear Algebra Operations on Vectors and Matrices}
\label{linalg}

Multiplying a vector by a scalar works directly, as seen earlier.  For
example, 

\begin{Verbatim}[fontsize=\relsize{-2}]
> y
[1]  1  3  4 10
> 2*y
[1]  2  6  8 20
\end{Verbatim}

If you wish to compute the inner product (``dot product'') of two
vectors, use {\bf crossprod()}.  Note that the name is a misnomer, as the
function does not compute vector cross product.

For matrix multiplication in the mathematical sense, the operator to
use is \%*\%, not *. Note also that a vector is considered a one-row
matrix, not a one-column matrix, and thus is suitable as the left factor
in a matrix product, but not directly usable as the right factor.

The function {\bf solve()} will solve systems of linear equations, and
even find matrix inverses.  For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> a <- matrix(c(1,1,-1,1),nrow=2,ncol=2)
> b <- c(2,4)
> solve(a,b)
[1] 3 1
> solve(a)
  [,1] [,2]
[1,]  0.5  0.5
[2,] -0.5  0.5
\end{Verbatim}

Use {\bf t()} for matrix transpose, {\bf qr()} for QR decomposition,
{\bf chol()} for Cholesky, and {\bf det()} for determinants.  

Use {\bf eigen()} to compute eigenvalues and eigenvectors, though if the
matrix in question is a covariance matrix, the R function {\bf prcomp()}
may be preferable.  The function {\bf diag()} extracts the diagonal of a
square matrix, useful for obtaining variances from a covariance matrix.

\section{Lists}
\label{list}

R's  {\bf list}  structure is similar to a C {\bf struct}.  It plays an
important role in R, with data frames, object oriented
programming and so on, as we will see later.

\subsection{Creation}

As an example, consider an employee database.  Suppose for each employee
we store name, salary and a boolean indicating union membership.  We
could initialize our database to be empty if we wish:

\begin{Verbatim}[fontsize=\relsize{-2}]
j <- list()
\end{Verbatim}

Or we could create a list and enter our first employee, Joe, this way:

\begin{Verbatim}[fontsize=\relsize{-2}]
j <- list(name="Joe", salary=55000, union=T)
\end{Verbatim}

We could print {\bf j} out:

\begin{Verbatim}[fontsize=\relsize{-2}]
> j
$name
[1] "Joe"

$salary
[1] 55000

$union
[1] TRUE

\end{Verbatim}

Actually, the element names, e.g. ``salary,'' are optional.  One could
alternatively do this:

\begin{Verbatim}[fontsize=\relsize{-2}]
> jalt <- list("Joe", 55000, T)
> jalt
[[1]]
[1] "Joe"

[[2]]
[1] 55000

[[3]]
[1] TRUE

\end{Verbatim}

Here we refer to {\bf jalt}'s elements as 1, 2 and 3 (which we can also
do for {\bf j} above).

\subsection{List Tags and Values, and the unlist() Function}

If the elements in a list do have names, e.g. with {\bf name}, {\bf
salary} and {\bf union} for {\bf j} above, these names are called {\bf
tags}.  The value associated with a tag is indeed called its {\bf
value}.

You can obtain the tags via {\bf names()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> names(j)
[1] "name"   "salary" "union"
\end{Verbatim}

To obtain the values, use {\bf unlist()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> ulj <- unlist(j)
> ulj
   name  salary   union
  "Joe" "55000"  "TRUE"
> class(ulj)
[1] "character"
\end{Verbatim}

The return value of {\bf unlist()} is a vector, in this case a vector of
mode character, i.e. a vector of character strings.  

\subsection{Issues of Mode Precedence}
\label{preced}

Let's look at this a bit more closely:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x
$abc
[1] 2

$de
[1] 5

\end{Verbatim}

Here the list {\bf x} has two elements, with {\bf x\$abc = 2} and {\bf
x\$de = 5}.  Just for practice, let's call {\bf names()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> names(x)
[1] "abc" "de"
\end{Verbatim}

Now let's try {\bf unlist()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> ulx <- unlist(x)
> ulx
abc  de
  2   5
> class(ulx)
[1] "numeric"
\end{Verbatim}

So again {\bf unlist()} returned a vector, but R noticed that all the
values were numeric, so it gave {\bf ulx} that mode.  By contrast, with
{\bf ulj} above, though one of the values was numeric, R was forced to
take the ``least common denominator,''  and make the vector of mode
character.

This sounds like some kind of precedence structure, and it is.  As R's
help for {\bf unlist()} states,

\begin{quote}

Where possible the list elements are coerced to a common mode during the
unlisting, and so the result often ends up as a character vector.
Vectors will be coerced to the highest type of the components in the
hierarchy NULL $<$ raw $<$ logical $<$ integer $<$ real $<$ complex $<$
character $<$ list $<$ expression: pairlists are treated as lists.

\end{quote}

But there is something else to deal with here.  Though {\bf ulx} is a
vector and not a list, R did give each of the elements a name.  We can
remove them by settings their names to NULL, seen in Section \ref{ven}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> names(ulx) <- NULL
> ulx
[1] 2 5
\end{Verbatim}

\subsection{Accessing List Elements}

The \$ symbol is used to designate named elements of a list, but also [[
]] works for referencing a single element and [ ] works for a group of
them:  

\begin{Verbatim}[fontsize=\relsize{-2}]
> j
$name
[1] "Joe"

$salary
[1] 55000

$union
[1] TRUE

> j[[1]]
[1] "Joe"

> j[2:3]
$salary
[1] 55000

$union
[1] TRUE

\end{Verbatim}


Note that [[ ]] returns a value, while [ ] returns a sublist.

\subsection{Adding/Deleting List Elements}

One can dynamically add and delete elements:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- list(a="abc",b=12)
> z
$a
[1] "abc"

$b
[1] 12

> z$c = 1
> z
$a
[1] "abc"

$b
[1] 12

$c
[1] 1

> z$1] <- NULL  # delete element 1
> z
$b
[1] 12

$c
[1] 1

[[3]]
[1] 1 2
> if (is.null(z$d)) print("it's not there")  # testing existence
[1] "it's not there"
> y[[2]] <- 8  # can "skip" elements
> y
[[1]]
NULL

[[2]]
[1] 8

\end{Verbatim}

\subsection{Indexing of Lists}

To do indexing of a list, use [ ] instead of [[ ]]:

\begin{Verbatim}[fontsize=\relsize{-2}]
 z[2:3]
$c
[1] 1

[[2]]
[1] 1 2

\end{Verbatim}

Names of list elements can be abbreviated to whatever extent is possible
without causing ambiguity, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> j$sal
[1] 55000
\end{Verbatim}

One common use is to package return values for functions that return
more than one piece of information.  Say for instance the function {\bf
f()} returns a matrix {\bf m} and a vector {\bf v}. Then one could write

\begin{Verbatim}[fontsize=\relsize{-2}]
return(list(mat=m, vec=v))
\end{Verbatim}

at the end of the function, and then have the caller access these items
like this:

\begin{Verbatim}[fontsize=\relsize{-2}]
l <- f()
m <- l$mat
v <- l$vec
\end{Verbatim}

This is typical form for functions in the R library.

\subsection{Size of a List}

You can obtain the number of elements in a list via {\bf length()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> length(j)
[1] 3
\end{Verbatim}

\subsection{Recursive Lists}

Lists can be recursive, i.e. you can have lists within lists.  For
instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> b <- list(u = 5, v = 12)
> c <- list(w = 13)
> a <- list(b,c)
> a
[[1]]
[[1]]$u
[1] 5

[[1]]$v
[1] 12


[[2]]
[[2]]$w
[1] 13


> length(a)
[1] 2

\end{Verbatim}

So, {\bf a} is now a two-element list, with each element itself being a
list.

\section{Data Frames}

On an intuitive level, a {\it data frame} is like a matrix, with a
rows-and-columns structure.  However, it differs from a matrix in that
each column may have a different mode.  For instance, one column may be
numbers and another column might be character strings.

On a technical level, a data frame is a list of vectors.  See Section
\ref{listrep} for more on this.

\subsection{A Second R Example Session}
\label{second}

For example, consider an employee dataset.  Each row of our data frame
would correspond to one employee.  The first column might be the
employee's name, thus of {\bf character} (i.e. string) mode, with the
second column being salary, thus of {\bf numeric} mode, and with the
third column being a boolean (i.e. {\bf logical} mode) indicating
whether the employee is in a union.

As my sample data set, I have created a file named {\bf exams},
consisting of grades for the three exams in a certain course (two
midterm exams and a final exam). The first few lines in the file are

\begin{Verbatim}[fontsize=\relsize{-2}]
Exam1 Exam2 Exam3
62 70 60
74 34 64
50 35 40
...
\end{Verbatim}

Note that I have separated fields here by spaces.  

As you can see, other than the first record, which contains the names of
the columns (i.e. the variables), each line contains the three exam
scores for one student. This is the classical ``two-dimensional file"
notion, i.e. each line in our file contains the data for one observation
in a statistical dataset. The idea of a data frame is to encapsulate
such data, along with variable names into one object.

As mentioned, I've specified the variable names in the first record. Our
variable names didn't have any embedded spaces in this case, but if they
had, we'd need to quote any such name, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
"Exam 1" "Exam 2" "Exam 3"
\end{Verbatim}

Suppose the second exam score for the third student had been missing. Then
we would have typed

\begin{Verbatim}[fontsize=\relsize{-2}]
50 NA 40
\end{Verbatim}

in that line of the exams file. In any subsequent statistical analyses,
R would do its best to cope with the missing data, in the obvious
manners.  (We may have to set the option {\bf na.rm=T}.) If for instance
we had wanted to find the mean score on Exam 2, R would find the mean
among all students except the third.

We first read in the data from the file exams into a data frame which we'll
name {\bf testscores}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> testscores <- read.table("exams",header=TRUE)
\end{Verbatim}

The parameter {\bf header=TRUE} tells R that we do have a header line
(for the variable names), so R should not count that first line in the
file as data.  

In R, the components of an object are accessed via the \$ operator. For
example, the vector of all the Exam1 scores is {\bf testscores\$Exam1},
as we confirm here:

\begin{Verbatim}[fontsize=\relsize{-2}]
> testscores$Exam1
[1]  62  74  50  62  39  60  48  80  49  49 100  30  61 100  82  37  54 65  36
[20]  97  60  80  70  50  60  24  60  75  77  71  25  93  80  92  75 26  27  55
[39]  30  44  86  35  95  98  50  50  34 100  57  99  67  77  70  53 38
\end{Verbatim}

The [1] means that items 1-19 start here, the [20] means that items 20-38
start here, etc.

We will illustrate operations on this data in the following sections.

\subsection{List Representation}
\label{listrep}

You will have a better understanding of data frames if you keep in mind
that technically, a data frame is implemented as a list of equal-length
vectors.  (Recall that R's {\it list} construct was covered in Section
\ref{list}.) Each column is one element of the list.  

For instance, in Section \ref{second}, the data frame {\bf testscores}'
first column is referenced as {\bf testscores\$Exam1}, which is list
notation.

Recall that elements of vectors can have names.  The rows of a data
frame can have names too.  So, a data frame does have a bit more
structure than a general list of vectors.

\subsection{Matrix-Like Operations}

Many matrix operations can also be used on data frames.  

\subsubsection{rowMeans() and colMeans()}

\begin{Verbatim}[fontsize=\relsize{-2}]
> colMeans(testscores)
Exam1    Exam2    Exam3
62.14545 51.27273 50.05455
\end{Verbatim}

\subsubsection{rbind() and cbind()}

The {\bf rbind()} and {\bf cbind()} matrix functions introduced in
Section \ref{rcbind} work here too.

We can also create new columns from old ones, e.g. we can add a variable
which is the difference between Exams 1 and 2:

\begin{Verbatim}[fontsize=\relsize{-2}]
> testscores$Diff21 <- testscores$Exam2 - testscores$Exam1
\end{Verbatim}

\subsubsection{Indexing}
\label{frameindex}

One can also refer to the rows and columns of a data frame using
two-dimensional array notation, including indexing. For instance, in our
example data frame {\bf testscores} here: 

\begin{itemize}

\item testscores[2,3] would refer to the third score for the second
student

\item testscores[2,] would refer to the set of all scores for the second
student

\item testscores[c(1,2,5),] would refer to the set of all scores for the
first, second and fifth students

\item testscores[10:13,] would refer to the set of all scores for the
tenth through thirteenth students

\item testscores[-2,] would refer to the set of all scores for all
students except the second

\end{itemize}

\subsection{Creating a New Data Frame from Scratch}

We saw above how to create a data frame by reading from a data file. We
can also create a data frame directly, using the function {\bf
data.frame()}.

For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- data.frame(cbind(c(1,2),c(3,4)))
> z
  X1 X2
1  1  3
2  2  4
\end{Verbatim}

Note again the use of the {\bf cbind()} function. 

We can also coerce a matrix to a data frame, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- matrix(c(1,2,3,4),nrow=2,ncol=2)
> x
  [,1] [,2]
[1,] 1    3
[2,] 2    4
> y <- data.frame(x)
> y
  X1 X2
1  1  3
2  2  4
\end{Verbatim}

As you can see, the column names will be X1, X2, ... However, you can change
them, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> z
  X1 X2
1  1  3
2  2  4
> names(z) <- c("col 1","col 2")
> z
  col 1 col 2
1  1     3
2  2     4
\end{Verbatim}

\subsection{Converting a List to a Data Frame}
\label{convert}

For printing, statistical calculations and so on, you may wish to
convert a list to a data frame.  Here's straightforward code to do it:

\begin{Verbatim}[fontsize=\relsize{-2}]
# converts a list lst to a data frame, which is the return value
wrtlst <- function(lst) {
   frm <- data.frame()
   rw <- 1
   for (key in names(lst)) {
      frm[rw,1] <- key
      frm[rw,2] <- lst[key]
      rw <- rw+1
   }
   return(frm)
}
\end{Verbatim}

But if our list has named tags and has only numeric values, the
following code may run faster:

\begin{Verbatim}[fontsize=\relsize{-2}]
# converts a list lst that has only numeric values to a data frame,
# which is the return value of the function
lsttodf <- function(lst) {
   n <- length(lst)
   # create the data frame, using the default column names
   frm <- data.frame(V1=character(n),V2=numeric(n))
   frm[,1] <- names(lst)
   frm[,2] <- as.numeric(unlist(lst))
   return(frm)
}
\end{Verbatim}

\subsection{The {\it Factor} Factor}

If your table does have a variable, i.e. a column, in character mode,
you probably should set {\bf as.is=T} in your call to {\bf
read.table()}, so that this variable stays a vector rather than a
factor.  Otherwise even your numeric columns will become factors, which
could cause problems as seen in our example below.

The same is true for creating a data frame via a call to {\bf
data.frame()}.  Consider:

\begin{Verbatim}[fontsize=\relsize{-2}]
> d <- data.frame(cbind(c(0,5,12,13),c("xyz","ab","yabc",NA)))
> d
  X1   X2
1  0  xyz
2  5   ab
3 12 yabc
4 13 <NA>
> d[1,1] <- 3
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = 3) :
  invalid factor level, NAs generated
> d
    X1   X2
1 <NA>  xyz
2    5   ab
3   12 yabc
4   13 <NA>
\end{Verbatim}

Why didn't {\bf d[1,1]} change?  Well, since the second column was a
character vector, {\bf data.frame()} treated it as a factor, and thus
``demoted'' the first column to factor status too.  Then when we tried
to assign the value 3 to {\bf d[1,1]}, R told us that 3 was not one of
the official values ({\bf levels} for this factor.

This can be avoided by setting {\bf stringsAsFactors} to false:

\begin{Verbatim}[fontsize=\relsize{-2}]
> d <- data.frame(cbind(c(0,5,12,13),c("xyz","ab","yabc",NA)),stringsAsFactors=F)
> d
  X1   X2
1  0  xyz
2  5   ab
3 12 yabc
4 13 <NA>
> d[1,1] <- 3
> d
  X1   X2
1  3  xyz
2  5   ab
3 12 yabc
4 13 <NA>
\end{Verbatim}

See Section \ref{readmatdf}.

\subsection{Extended Example: Data Preparation}

In a study of engineers and programmers sponsored by U.S. employers for
permanent residency, I considered the question, ``How many of these
workers are `the best and the brightest,' i.e. people of extraordinary
ability?''\footnote{N. Matloff, {\it New Insights from the Dept. of
Labor PERM Labor Certification Database}, Sloan Foundation West Coast
Program on Science and Engineering Workers conference, January 18,
2008.}

The government data is limited.  The (admittedly imperfect) way to
determine whether a worker is of extraordinary ability was to look at
the ratio of actual salary to the government prevailing wage for that
job and location. If that ratio is substantially higher than 1.0 (by law
it cannot be less than 1.0), one can reasonably assume that this worker
has a high level of talent.

I used R to prepare and analyze the data, and will present excerpts of
my preparation code here. First, I read in the data file:

\begin{Verbatim}[fontsize=\relsize{-2}]
all2006 <- read.csv("2006.csv",header=T,as.is=T)
\end{Verbatim}

The function {\bf read.csv()} is essentially identical to {\bf
read.table()}, except that the input data are in the Comma Separated
Value format exported by spreadsheets, which is the way the dataset was
prepared by the Department of Labor (DOL). We now have a data frame,
{\bf all2006}, consisting of all the data for the year 2006.

Some wages in the dataset were reported in hourly, rather than yearly,
terms. This indicates possible part- time job status, which I wanted to
exclude. A look at the documentation for the dataset showed that column
17 was the pay period, i.e. yearly, hourly, etc., while columns 15 and
19 contained the actual salary and prevailing wage. So, I did some
filtering:

\begin{Verbatim}[fontsize=\relsize{-2}]
all2006 <- all2006[all2006[,17]=="Year",]
all2006 <- all2006[all2006[,15]> 20000,]
all2006 <- all2006[all2006[,19]> 200,]
\end{Verbatim}

I also needed to create a new column for the ratio between actual wage
and prevailing wage:

\begin{Verbatim}[fontsize=\relsize{-2}]
all2006 <- cbind(all2006,all2006[,15]/all2006[,19])
\end{Verbatim}

This new variable is in column 25. Since I knew I would be calculating
the median in this column for many subsets of the data, I defined a
function to do this work:

\begin{Verbatim}[fontsize=\relsize{-2}]
medrat <- function(dataframe) {
    return(median(dataframe[,25],na.rm=T))
}
\end{Verbatim}

Note the need to exclude NA values, which are common in government datasets.

In addition, I wanted to analyze the talent patterns at particular
companies, using the company name stored in column 7:

\begin{Verbatim}[fontsize=\relsize{-2}]
makecorp <- function(corpname) {
    t <- all2006[all2006[,7]==corpname,]
    return(t)
}
goog2006 <- makecorp("GOOGLE INC.")
medrat(goog2006)
ms2006 <- makecorp("MICROSOFT CORPORATION")
medrat(ms2006)
...
\end{Verbatim}

I also wanted to analyze by occupation. DOL has a code number for each
job title, stored in column 14, which I used to create the corresponding
subsets of the original dataset:

\begin{Verbatim}[fontsize=\relsize{-2}]
makeocc <- function(df,lowerbd,upperbd) {
    return(df[df[,14]>=lowerbd & df[,14]<=upperbd,])
}
# for 2007 data
prg2007 <- makeocc(all2007,"15-1021.00","15-1052.00") # programmers
se2007 <- makeocc(all2007,"15-1030.00","15-1039.00") # s.w. engineers
engr2007 <- makeocc(all2007,"17-2000.00","17-2999.00") # other engineers
...
\end{Verbatim}

One more example: I wanted to analyze by nationality, which is in column
24. Note my use of looping over a vector of mode character:

\begin{Verbatim}[fontsize=\relsize{-2}]
mainnatnames <- c("CHINA", "INDIA", "CANADA", "GERMANY", "UNITED KINGDOM")
natwithindf <- function(natlist,df) {
   for (nat in natlist) {
      tmpdf <- df[df[,24]==nat,]
      mr <- medrat(tmpdf)
      cat(nat,": ",mr,"\n")
   }
}
natwithindf(mainnatnames,prg2007)
natwithindf(mainnatnames,se2007)
...
\end{Verbatim}

\section{Factors and Tables}
\label{factors}

Consider the data frame, say in a file {\bf ct.dat},

\begin{Verbatim}[fontsize=\relsize{-2}]
"VoteX" "VoteLastTime"
"Yes" "Yes"
"Yes" "No"
"No" "No"
"Not Sure" "Yes"
"No" "No"
\end{Verbatim}

where in the usual statistical fashion each row represents one subject under
study. In this case, say we have asked five people (a) "Do you plan to vote
for Candidate X?" and (b) "Did you vote in the last election?"  (The
first line in the file is a header.)

Let's read in the file:

\begin{Verbatim}[fontsize=\relsize{-2}]
> ct <- read.table("ct.dat",header=T)
> ct
     VoteX VoteLastTime
1      Yes          Yes
2      Yes           No
3       No           No
4 Not Sure          Yes
5       No           No
\end{Verbatim}

We can use the {\bf table()} function to convert this data to
contingency table format, i.e. a display of the counts of the various
combinations of the two variables:

\begin{Verbatim}[fontsize=\relsize{-2}]
> cttab <- table(ct)
> cttab
          VoteLastTime
VoteX      No Yes
  No        2   0
  Not Sure  0   1
  Yes       1   1
\end{Verbatim}

The 2 in the upper-left corner of the table shows that we had, for
example, two people who said No to (a) and No to (b).  The 1 in the
middle-right indicates that one person answered Not Sure to (a) and Yes
to (b).

We can in turn change this to a data frame---not the original one, but a
data-frame version of the contingency table:

\begin{Verbatim}[fontsize=\relsize{-2}]
> ctdf <- as.data.frame(cttab)
> ctdf
     VoteX VoteLastTime Freq
1       No           No    2
2 Not Sure           No    0
3      Yes           No    1
4       No          Yes    0
5 Not Sure          Yes    1
6      Yes          Yes    1
\end{Verbatim}

This is useful, for instance, in the log-linear model (Section
\ref{loglin}).

Note that in our original data frame, the two columns are called {\it
factors} in R.  A factor is basically a vector of mode {\bf character},
intended to represent values of a categorical variable, such as the {\bf
ct\$VoteX} variable above.  The {\bf factor} class includes a component
{\bf levels}, which in the case of {\bf ct\$VoteX} are Yes, No and Not
Sure.

Of course, all of the above would still work if our original data frame {\bf
ct} we had three factors, or more.

We get counts on a single factor in isolation as well, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- factor(c("a","b","a","a","b"))
> z <- table(y)
> z
y
a b
3 2
> as.vector(z)
[1] 3 2
\end{Verbatim}

Note the use here of {\bf as.vector()} to extract only the counts.

Among other things, this gives us an easy way to determine what
proportion of items satisfy a certain condition.  For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c (5,12,13,3,4,5)
> table(x == 5)

FALSE  TRUE 
    4     2 
\end{Verbatim}

So our answer is 2/6.

The function {\bf table()} is often used with {\bf cut()}.  To explain
what the latter does, first consider the call

\begin{Verbatim}[fontsize=\relsize{-2}]
y <- cut(x,b,labels=F)
\end{Verbatim}

where {\bf x} is a vector of observations, and {\bf b} defines bins,
which are the semi-open intervals {\bf (b[1],b[2]], (b[2],b[3]],...}.
Then {\bf y[j]} will be the index {\bf i} such that {\bf x[j]} falls
into bin {\bf i}.

For instance,

\begin{Verbatim}[fontsize=\relsize{-2}]
> cut(1:8,c(0,4,7,8),labels=F)
[1] 1 1 1 1 2 2 2 3
\end{Verbatim}

The function {\bf cut()} has many, many other options but in our context
here, the point is that we can pipe the output of {\bf cut()} into {\bf
talbe()}, thus getting counts of the numbers of observations in each
bin.

\begin{Verbatim}[fontsize=\relsize{-2}]
bincounts <- function(x,b) {
   y <- cut(x,b)
   return(as.vector(table(y)))
}
\end{Verbatim}

\section{Missing Values}

Data sets in the real world tend to have problems.  Observed values can
be incorrect, and are often missing altogether.  R designates the latter
case by NA.  

One can test for this condition by using {\bf is.na()}.  When applied to
a vector, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- c(5,NA,12)
> is.na(z)
[1] FALSE  TRUE FALSE
\end{Verbatim}

this function can be used to identify all the missing values, which then
can be avoided in one's analysis.  Thus for instance the {\bf mean()}
function has such an option:

\begin{Verbatim}[fontsize=\relsize{-2}]
> mean(z)
[1] NA
> mean(z,na.rm=T)
[1] 8.5
\end{Verbatim}

\section{Functional Programming Features}
\label{funnature}

Functional programming has several benefits:

\begin{itemize}

\item Clearer, more compact code.

\item Potentially much faster execution speed.

\item Less debugging (since you write less code).

\item Easier transition to parallel programming.

\end{itemize}

\subsection{Elementwise Operations on Vectors}
\label{elementwise}

Suppose we have a function {\bf f()} that we wish to apply to all
elements of a vector {\bf x}.  In many cases, we can accomplish this by
simply calling {\bf f()} on {\bf x} itself.

\subsubsection{Vectorized Functions}

As we saw in Section \ref{vectorops}, many operations are {\bf
vectorized}, such as + and $>$:

\begin{Verbatim}[fontsize=\relsize{-2}]
> u <- c(5,2,8)
> v <- c(1,3,9)
> u+v
[1]  6  5 17
> u > v
[1]  TRUE FALSE FALSE
\end{Verbatim}

The key point is that if an R function uses vectorized
operations,\footnote{If not, use {\bf as.matrix()} to convert your
vector to a matrix, and use {\bf apply()}.  An example is given in
Section \ref{applyonvector}.} it too is vectorized, i.e. it can be
applied to vectors in an elementwise fashion.  For instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> w <- function(x) return(x+1)
> w(u)
[1] 6 3 9
\end{Verbatim}

Here {\bf w()} uses +, which is vectorized, so {\bf w()} is vectorized
as well.

This applies to many of R's built-in functions.  For instance, let's
apply the function for rounding to the nearest integer to an example
vector {\bf y}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- c(1.2,3.9,0.4)
> z <- round(y)
> z
[1] 1 4 0
\end{Verbatim}

The point is that the {\bf round()} function was applied individually to each
element in the vector {\bf y}. In fact, in

\begin{Verbatim}[fontsize=\relsize{-2}]
> round(1.2)
[1] 1
\end{Verbatim}

the operation still works, because the number 1.2 is actually considered to
be a vector that happens to consist of a single element 1.2.

Here we used the built-in function {\bf round()}, but you can do the same thing
with  functions  that  you  write  yourself.  

Note that the functions can also have extra arguments, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> f <- function(elt,s) return(elt+s)
> y <- c(1,2,4)
> f(y,1)
[1] 2 3 5
\end{Verbatim}

As seen above, even operators such as + are really functions. For
example, the reason why elementwise addition of 4 works here,

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- c(12,5,13)
> y+4
[1] 16  9 17
\end{Verbatim}

is that the + is actually considered a function! Look at it here:

\begin{Verbatim}[fontsize=\relsize{-2}]
> '+'(y,4)
[1] 16  9 17
\end{Verbatim}

\subsubsection{The Case of Vector-Valued Functions}

The above operations work with vector-valued functions too:

\begin{Verbatim}[fontsize=\relsize{-2}]
z12 <- function(z) return(c(z,z^2))
x <- 1:8
> z12(x)
 [1]  1  2  3  4  5  6  7  8  1  4  9 16 25 36 49 64
> matrix(z12(x),ncol=2)
     [,1] [,2]
[1,]    1    1
[2,]    2    4
[3,]    3    9
[4,]    4   16
[5,]    5   25
[6,]    6   36
[7,]    7   49
[8,]    8   64
\end{Verbatim}



\subsection{Filtering}
\label{filter}

Another idea borrowed from functional programming is filtering, which is
one of the most common operations in R.

\subsubsection{On Vectors}

For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- c(5,2,-3,8)
> w <- z[z*z > 8]
> w
[1] 5  -3  8
\end{Verbatim}

Here is what happened above: We asked R to find the indices of all 
the elements of {\bf z} whose squares were greater than 8, then use 
those indices in an indexing operation on {\bf z}, then finally
assign the result to {\bf w}.

Look at it done piece-by-piece:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- c(5,2,-3,8)
> z
[1]  5  2 -3  8
> z*z > 8
[1]  TRUE FALSE  TRUE  TRUE
\end{Verbatim}

Evaluation of the expression {\bf z*z $>$ 8} gave us a vector of
booleans! Let's go further:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z[c(TRUE,FALSE,TRUE,TRUE)]
[1]  5 -3  8
\end{Verbatim}

This example will place things into even sharper focus:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- c(5,2,-3,8)
> j <- z*z > 8
> j
[1]  TRUE FALSE  TRUE  TRUE
> y <- c(1,2,30,5)
> y[j]
[1]  1 30  5
\end{Verbatim}

We may just want to find the positions within {\bf z} at which the
condition occurs. We can do this using {\bf which()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> which(z*z > 8)
[1] 1 3 4
\end{Verbatim}

Here's an extension of an example in Section \ref{slicing}:

\begin{Verbatim}[fontsize=\relsize{-2}]
# x is an array of numbers, mostly in nondecreasing order, but with some
# violations of that order nviol() returns the number of indices i for
# which x[i+1] < x[i]

nviol <- function(x) {
   diff  <- x[-1]-x[1:(length(x)-1)]
   return(length(which(diff < 0)))
}
\end{Verbatim}

I noted in Section \ref{why} that using the {\bf nrow()} function in
conjunction with filtering provides a way to obtain a count of records
satisfying various conditions. If you just want the count and don't want
to create a new table, you should use this approach.

You can also use this to selectively change elements of a vector, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,3,8,2)
> x[x > 3] <- 0
> x
[1] 1 3 0 2
\end{Verbatim}

\subsubsection{On Matrices and Data Frames}

Filtering can be done with matrices and data frames too.  Note that one
must be careful with the syntax.  For instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x
     x
[1,] 1 2
[2,] 2 3
[3,] 3 4
> x[x[,2] >= 3,]
     x
[1,] 2 3
[2,] 3 4
\end{Verbatim}

Again, let's dissect this:

\begin{Verbatim}[fontsize=\relsize{-2}]
> j <- x[,2] >= 3
> j
[1] FALSE  TRUE  TRUE
> x[j,]
     x
[1,] 2 3
[2,] 3 4
\end{Verbatim}

Here is another example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> m <- matrix(c(1,2,3,4,5,6),nrow=3)
> m
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> m[m[,1] > 1,]
     [,1] [,2]
[1,]    2    5
[2,]    3    6
> m[m[,1] > 1 & m[,2] > 5,]
[1] 3 6
\end{Verbatim}

\subsection{Combining Elementwise Operations and Filtering, with the
ifelse() Function}

The form is

\begin{Verbatim}[fontsize=\relsize{-2}]
ifelse(b,u,v)
\end{Verbatim}

where b is a boolean vector, and u and v are vectors.  

The return value is a vector, element i of which is {\bf u[i]} if {\bf
b[i]} is true, or {\bf v[i]} if {\bf b[i]} is false.  This is pretty
abstract, so let's go right to an example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- 1:10
> y <- ifelse(x %% 2 == 0,5,12)
> y
 [1] 12  5 12  5 12  5 12  5 12  5
\end{Verbatim}

Here we wish to produce a vector in which there is a 5 wherever {\bf x}
is even, with a 12 wherever {\bf x} is odd.  So, the first argument is
c(F,T,F,T,F).  The second argument, 1, is treated as c(1,1,1,1,1) by
recycling, and similarly for the third argument.

Here is another example, in which we have explicit vectors.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(5,2,9,12)
> ifelse(x > 6,2*x,3*x)
[1] 15  6 18 24
\end{Verbatim}

The advantage of {\bf ifelse()} over the standard if-then-else is that
it is vectorized.  Thus it's potentially much faster.

Due to the vector nature of the arguments, one can nest {\bf ifelse()}
operations.  In the following example, involving an abalone data set,
gender is coded as `M', `F' or `I', the last meaning infant.  We wish to
recode those characters as 1, 2 or 3:

\begin{Verbatim}[fontsize=\relsize{-2}]
> g <- c("M","F","F","I","M")
> ifelse(g == "M",1,ifelse(g == "F",2,3))
[1] 1 2 2 3 1
\end{Verbatim}

The inner call to {\bf ifelse()}, which of course is evaluated first,
produces a vector of 2s and 3s, with the 2s corresponding to female
cases, and 3s being for males and infants.  The outer call results in 1s
for the males, in which cases the 3s are ignored.

Remember, the vectors involved could be columns in matrices, and this is
a very common scenario.  Say our abalone data is stored in the matrix
{\bf ab}, with gender in the first column.  Then if we wish to recode 
as above, we could do it this way:

\begin{Verbatim}[fontsize=\relsize{-2}]
> ab[,1] <- ifelse(ab[,1] == "M",1,ifelse(ab[,1] == "F",2,3))
\end{Verbatim}


\subsection{Applying the Same Function to All Elements of a Matrix, Data
Frame and Even More}
\label{apply}

This is not just for compactness of code, but for speed. If speed is an
issue,  such  as  when  working  with  large data sets or long-running
simulations, one must avoid explicit loops as much as possible, because R
can do them a lot faster than you can.

To this end, there is the {\bf apply()} function and its variants.

\subsubsection{Applying the Same Function to All Rows or Columns of a Matrix}

The arguments of {\bf apply()} are the matrix/data frame to be applied
to, the dimension---1 if the function applies to rows, 2 for
columns---and the function to be applied.

For example, here we apply the built-in R function {\bf mean()} to each
column of a matrix {\bf z}.

\begin{Verbatim}[fontsize=\relsize{-2}]
> z
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> apply(z,2,mean)
[1] 2 5
\end{Verbatim}

Here is an example of working on rows, using our own function:

\begin{Verbatim}[fontsize=\relsize{-2}]
> f <- function(x) x/c(2,8)
> y <- apply(z,1,f)
> y
  [,1]  [,2] [,3]
[1,]  0.5 1.000 1.50
[2,]  0.5 0.625 0.75
\end{Verbatim}

You might be surprised that the size of the result here is 2 x 3 rather
than 3 x 2.  If the function to be applied returns a vector of k
components, the result of {\bf apply()} will have k rows. You can use
the matrix transpose function {\bf t()} to change it.

As you can see, the function to be applied needs at least one argument,
which will play the role of one row or column in the array. In some cases,
you  will need additional arguments, which you can place following the
function name in your call to {\bf apply()}.

For instance, suppose we have a matrix of 1s and 0s, and want to create
a vector as follows:  For each row of the matrix, the corresponding
element of the vector will be either 1 or 0, depending on whether the
majority of the first {\bf c} elements in that row are 1 or 0.  Here
{\bf c} will be a parameter which we may wish to vary.  We could do
this:

\begin{Verbatim}[fontsize=\relsize{-2}]
> copymaj <- function(rw,c) {
+    maj <- sum(rw[1:c]) / c
+    return(ifelse(maj > 0.5,1,0))
+ }
> x <- matrix(c(1,1,1,0, 0,1,0,1, 1,1,0,1, 1,1,1,1, 0,0,1,0),nrow=4)
> x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    1    0
[2,]    1    1    1    1    0
[3,]    1    0    0    1    1
[4,]    0    1    1    1    0
> apply(x,1,copymaj,3)
[1] 1 1 0 1
> apply(x,1,copymaj,2)
[1] 0 1 0 0
\end{Verbatim}

Here the values 3 and 2 form the actual arguments for the formal
argument {\bf c} in {\bf copymaj()}.  

So, the general form of apply is

\begin{Verbatim}[fontsize=\relsize{-2}]
apply(m,dimcode,f,fargs}
\end{Verbatim}

where {\bf m} is the matrix, {\bf dimcode} is 1 or 2, according to
whether we will operate on rows or columns, {\bf f} is the function to
be applied, and {\bf fargs} is an optional list of arguments to be
supplied to {\bf f}.

{\it Note carefully that in writing {\bf f()} itself, its first argument 
must be a vector that will be supplied by the caller as a row or column
of {\bf m}}.

As R moves closer and closer to parallel processing, functions like {\bf
apply()} will become more and more important.  For example, the {\bf
clusterApply()} function in the snow package gives R some parallel
processing capability, by distributing the submatrix data to various
network nodes, with each one basically running {\bf apply()} on its
submatrix, and then collect the results.  See Section \ref{snow}.

\subsubsection{Applying the Same Function to All Elements of a List}
\label{lapply}

The analog of {\bf apply()} for lists is {\bf lapply()}.  It applies the
given function to all elements of the specified list.  For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> lapply(list(1:3,25:27),median)
[[1]]
[1] 2

[[2]]
[1] 26
\end{Verbatim}

In this example the list was created only as a temporary measure, so we
should convert back to numeric:

\begin{Verbatim}[fontsize=\relsize{-2}]
> as.numeric(lapply(list(1:3,25:27),median))
[1]  2 26
\end{Verbatim}

\subsubsection{Applying the Same Function to a Vector}
\label{applyonvector}

If your function is vectorized, then you can simply call it on the
vector, as in Section \ref{elementwise}.  

Otherwise, you can still avoid writing a loop, e.g. writing

\begin{Verbatim}[fontsize=\relsize{-2}]
lv <- length(v)
outvec <- vector(length=lv)
for (i in 1:lv) {
   outvec[i] <- f(v[i])
}
\end{Verbatim}

as follows:

\begin{Verbatim}[fontsize=\relsize{-2}]
outvec <- apply(as.matrix(v),1,f)
\end{Verbatim}

The call to {\bf as.matrix()} will return a matrix whose sole column is
{\bf v}.

This may not save you much time if you are running R on just one
machine, but if you are using, for instance, the snow package (see
Section \ref{snow}), with {\bf parApply()} instead of {\bf apply()}, it
could be well worth doing.

\subsection{Functions Are First-Class Objects}

Functions can be used as arguments, assigned, etc.  For instance, 

\begin{Verbatim}[fontsize=\relsize{-2}]
> f1 <- function(a,b) return(a+b)
> f2 <- function(a,b) return(a-b)
> f <- f1
> f(3,2)
[1] 5
> f <- f2
> f(3,2)
[1] 1
> g <- function(h,a,b) h(a,b)
> g(f1,3,2)
[1] 5
> g(f2,3,2)
[1] 1
\end{Verbatim}

Since you can view any object when in R's interactive mode by typing the
name of the object, and since functions are objects, you can view the
code for a function (either one you wrote, or one in R) in this manner.
For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
> f1
function(a,b) return(a+b)
\end{Verbatim}

This  is handy if you're using a function that you've written but have
forgotten what its arguments are, for instance. It's also useful if you are
not quite sure what an R library function does; by looking at the code you
may understand it better.

Also, a nice feature is that you can edit functions from within R. In the
case above, I could change {\bf f1()} by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> f1 <- edit(f1)
\end{Verbatim}

This would open the default editor on the code for {\bf f1}, which I
could then edit and save back to {\bf f1}. 

The editor invokved will depend on R's internal options variable
{\bf editor}.  In Unix-class systems, R will set this from your
EDITOR or VISUAL environment variable, or you can set it yourself, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> options(editor="/usr/bin/vim")
\end{Verbatim}

See the online documentation if you have any problems.

Note that in this example I am saving the revision back to the same
function.  Note too that when I do so, I am making an assignment, and
thus the R interpreter will compile the code; if I have any errors, the
assignment will not be done.  I can recover by the command

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- edit()
\end{Verbatim}

Warning:  Apparently comments are not preserved.

\section{R Programming Structures}

R is a full programming language, similar to scripting languages such as
Perl and Python. One can define functions, use constructs such as loops and
conditionals, etc.

\subsection{Use of Braces for Block Definition}

The  body of a {\bf for}, {\bf if} or similar statement does not need
braces if it consists of a single statement.

\subsection{Loops}

\subsection{Basic Structure}

In our function {\bf oddcount()} in Section \ref{writefun}, the line

\begin{Verbatim}[fontsize=\relsize{-2}]
+ for (n in x)  {
\end{Verbatim}

will be instantly recognized by Python programmers. It of course
means that there will be one iteration of the loop for each component of
the vector {\bf x}, with {\bf n} taking on the values of those
components. In other words, in the first iteration, {\bf n = x[1]}, in
the second iteration {\bf n = x[2]}, etc.

Looping with {\bf while} and {\bf repeat} are also available, complete
with {\bf break}, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> i <- 1
> while(1) {
+    i <- i+4
+    if (i > 10) break
+ }
> i
[1] 13
\end{Verbatim} 

(Of course, {\bf break} can be used with {\bf for} too.)

\subsubsection{Looping Over Nonvector Sets}

The {\bf for} construct works on any vector, regardless of mode.  One
can loop over a vector of file names, for instance.  Say we have files
{\bf x} and {\bf y} with contents

\begin{Verbatim}[fontsize=\relsize{-2}]
1 
2 
3
4 
5
6
\end{Verbatim}

and

\begin{Verbatim}[fontsize=\relsize{-2}]
5
12
13
\end{Verbatim}

Then this loop prints each of them:

\begin{Verbatim}[fontsize=\relsize{-2}]
> for (fn in c("x","y")) print(scan(fn))
Read 6 items
[1] 1 2 3 4 5 6
Read 3 items
[1]  5 12 13
\end{Verbatim}

R does not directly support iteration over nonvector sets, but there are
indirect yet easy ways to accomplish it.  One way would be to use {\bf
lapply()}, as shown in Section \ref{lapply}.  Another would be to use
{\bf get()}, e.g.:
\label{getex}

\begin{Verbatim}[fontsize=\relsize{-2}]
> u
     [,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3    4
> v
     [,1] [,2]
[1,]    8   15
[2,]   12   10
[3,]   20    2
> for (m in c("u","v")) {
+    z <- get(m)
+    print(lm(z[,2] ~ z[,1]))
+ }

Call:
lm(formula = z[, 2] ~ z[, 1])

Coefficients:
(Intercept)       z[, 1]
    -0.6667       1.5000

Call:
lm(formula = z[, 2] ~ z[, 1])

Coefficients:
(Intercept)       z[, 1]
     23.286       -1.071
\end{Verbatim}

The reader is welcome to make his/her own refinements here.

\subsection{Return Values}

By the way, you often don't need the {\bf return()} call. The last value
computed will be returned by default. In the {\bf oddcount() }example in
Section \ref{writefun}, instead of writing

\begin{Verbatim}[fontsize=\relsize{-2}]
return(k)
\end{Verbatim}

we could simply write

\begin{Verbatim}[fontsize=\relsize{-2}]
k
\end{Verbatim}

This  is  true  for  nonscalars  too (recall that a scalar is really a
one-element vector anyway), e.g.:

\begin{Verbatim}[fontsize=\relsize{-2}]
> r <- function(x,y) {
+    c(x+y,x-y)
+ }
> r(3,2)
[1] 5 1
\end{Verbatim}

\subsection{If-Else}

The syntax for if-else is like this:

\begin{Verbatim}[fontsize=\relsize{-2}]
> if (r == 4) {
+    x <- 1
+    y <- 2
+ } else {
+    x <- 3
+    y <- 4
+ }
\end{Verbatim}

{\bf Note that the braces are necessary, even for single-statement
bodies for the {\bf if} and {\bf else}, and the newlines are important
too.}  For instance, the left brace before the {\bf else} is what the
parser uses to tell that this is an {\bf if-else} rather than just an
{\bf if}; this would be easy for the parser to handle in batch mode but
not in interactive mode.  However, if you place the {\bf else} on the
same line as the {\bf if}, this problem will not occur.

See  also  the {\bf ifelse()} function discussed in Section
\ref{funnature}.

\subsection{Local and Global Variables}

Say a variable {\bf z} appearing within a function has the same name as
a global.  Then it will be treated as local, except that its initial
value will be that of the global. Subsequent assignment to it within the
function will ordinarily (see exception in Section \ref{iwantglobals})
not change the value of the global. For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> u
[1] 1
> v
[1] 2
> f
function(x) {
   y <- u
   y <- y + 3
   u <- x
return(x+y+u+v)
}
> f(5)
[1] 16
> u
[1] 1
> v
[1] 2
> y
Error: object "y" not found
\end{Verbatim}

\subsection{Function Arguments Don't Change}

Yet another influence of the functional programming philosophy is that
functions do not change their arguments, i.e. there are no {\it side
effects} (unless the result is re-assigned to the argument). Consider,
for instance, this:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(4,1,3) 
> y <- sort(x)
> y
[1] 1 3 4
> x
[1] 4 1 3
\end{Verbatim}

The point is that {\bf x} didn't change.

Again, if you want the value of an argument to change, the best way to
accomplish this is to reassign the return value of the function to the
argument.

\subsection{Writing to Globals Using the Superassignment Operator}
\label{iwantglobals}
If you do want to write to global variables (or more precisely, to
variables one level higher than the current scope), you can use the {\bf
superassignment} operator, $>>-$.  For example,

\begin{verbatim}
> two <- function(u) {
+    u <<- 2*u
+    y <<- 2*y
+    z <- 2*z
+ }
> x <- 1
> y <- 2
> z <- 3
> two(x)
> x
[1] 1
> y
[1] 4
> z
[1] 3
\end{verbatim}

\subsection{Arithmetic and Boolean Operators and Values}

\begin{Verbatim}[fontsize=\relsize{-2}]
x + y            addition
x - y            subtraction
x * y            multiplication
x / y            division
x ^ y            exponentiation
x %% y           modular arithmetic
x %/% y          integer division
x == y           test for equality
x <= y           test for less-than-or-equal
x >= y           test for greater-than-or-equal
x && y            boolean and for scalars
x || y            boolean or for scalars
x & y            boolean and for vectors (vector x,y,result)
x | y            boolean or for vectors (vector x,y,result)
!x               boolean negation
\end{Verbatim}

The boolean values are TRUE and FALSE. They can be abbreviated to T and F,
but  must be capitalized. These values change to 1 and 0 in arithmetic
expressions, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> 1 < 2
[1] TRUE
> (1 < 2) * (3 < 4)
[1] 1
> (1 < 2) * (3 < 4) * (5 < 1)
[1] 0
> (1 < 2) == TRUE
[1] TRUE
> (1 < 2) == 1
[1] TRUE
\end{Verbatim}

There are set operations, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,5)
> y <- c(5,1,8,9)
> union(x,y)
[1] 1 2 5 8 9
> intersect(x,y)
[1] 1 5
> setdiff(x,y)
[1] 2
> setdiff(y,x)
[1] 8 9
> setequal(x,y)
[1] FALSE
> setequal(x,c(1,2,5))
[1] TRUE
> 2 %in% x  # note that plain "in" doesn't work
[1] TRUE
> 2 %in% y
[1] FALSE
\end{Verbatim}

You can invent your own operators! Just write a function whose name begins
and ends with \%. Here is an operator for the symmetric difference between
two sets (i.e. all the elements in exactly one of the two operand sets):

\begin{Verbatim}[fontsize=\relsize{-2}]
> "%sdf%" <- function(a,b) {
+    sdfxy <- setdiff(x,y)
+    sdfyx <- setdiff(y,x)
+    return(union(sdfxy,sdfyx))
+ }
> x %sdf% y
[1] 2 8 9
\end{Verbatim}

\subsection{Named Arguments}

In Section \ref{second}, we read in a dataset from a file {\bf exams}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> testscores <- read.table("exams",header=TRUE)
\end{Verbatim}

The parameter {\bf header=TRUE} told R that we did have a header line,
so R should not count that first line in the file as data.

This is an example of the use of {\it named arguments}. The function
{\bf read.table()} has a number of arguments, some of which are
optional, which means that we must  specify which arguments we are
using, by using their names, e.g. {\bf header=TRUE} above. (Again,
Python programmers will find this familiar.) The ones  you  don't
specify  all have default values. 

\section{Writing Fast R Code}
\label{efficient}

A central theme in R programming is avoidance of explicit loops.
instead relying on  R's rich functionality to do  the work for you. Not
only does this save you programming and debugging time, it produces
faster code, since R's functions have been written for efficiency.  {\bf
This can be of the utmost importance in applications with large data
sets or large amounts of computation.}

For instance, let's rewrite our function {\bf oddcount()} in Section
\ref{writefun}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> oddcount <- function(x) return(length(which(x%%2==1)))
\end{Verbatim}

Let's test it:

\begin{Verbatim}[fontsize=\relsize{-2}]
> oddcount(c(1,5,6,2,19))
[1] 3
\end{Verbatim}

There is no explicit loop in this version of our {\bf oddcount()}. We
used R's vector filtering to avoid a loop, and even though R internally
will loop through the array, it will do so much faster than we would
with an explicit loop in our R code.  In essence, the looping is done in
C in native machine code, rather than interpretively in R code.  

Of course, R's functional programming features, described in Section
\ref{funnature}, provide many ways to help us avoid explicit loops.

For more detailed information on improving R performance, see
\url{http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf}.

\section{Simulation Programming in R}
\label{simulation}

\subsection{The Basics}

Here is an example, finding P(Z $<$ 1) for a N(0,1) random variable Z:

\begin{Verbatim}[fontsize=\relsize{-2}]
> count <- 0
> for (i in seq(1,100000))
+    if (rnorm(1) < 1.0) count <- count + 1
> count/100000
[1] 0.832
> count/100000.0
[1] 0.832
\end{Verbatim}

\subsection{Achieving Better Speed}

But as noted in Section \ref{efficient}, you should try to use R's
built-in features for greater speed. The above code would be better
written

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- rnorm(100000)
> length(x[x < 1.0])/100000.0
\end{Verbatim}

We achieve an increase in speed at the expense of using more memory, by
keeping our random numbers in an array instead of generating and discarding
them one at a time. Suppose for example we wish to simulate sampling from an
adult human population in which height is normally distributed with mean 69
and standard deviation 2.5 for men, with corresponding values 64 and 2 for
women. We'll create a matrix for the data, with column 1 showing gender (1
for male, 0 for female) and column 2 showing height. The straightforward way
to do this would be something like

\begin{Verbatim}[fontsize=\relsize{-2}]
sim1 <- function(n)  {
   xm <- matrix(nrow=n,ncol=2)
   for (i in 1:n)  {
      d <- rnorm(1)
      if (runif(1) < 0.5) {
         xm[i,1] <- 1
         xm[i,2] <- 2.5*d + 69
      } else {
         xm[i,1] <- 0
         xm[i,2] <- 2*d + 64
      }
   }
   return(xm)
}
\end{Verbatim}

We could avoid a loop this way:

\begin{Verbatim}[fontsize=\relsize{-2}]
sim2 <- function(n)  {
   d <- matrix(nrow=n,ncol=2)
   d[,1] <- runif(n)
   d[,2] <- rnorm(n)
   smpl <- function(rw) {  # rw = one row of d
      if (rw[1] < 0.5) {
         y <- 1
         x <- 2.5*rw[2] + 69
      } else {
         y <- 0
         x <- 2*rw[2] + 64
      }
      return(c(y,x))
   }
   z <- apply(d,1,smpl)
   return(t(z))
}
\end{Verbatim}

Here is a quick illustration of the fact that we do gain in performance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> system.time(sim1(1000))
[1] 0.028 0.000 0.027 0.000 0.000
> system.time(sim2(1000))
[1] 0.016 0.000 0.018 0.000 0.000
\end{Verbatim}

Here is a slightly more complicated example, using a classical problem from
elementary probability courses. Urn 1 contains 10 blue marbles and eight
blue ones. In Urn 2 the mixture is six blue and six yellow. We draw a marble
at random from Urn 1 and transfer it to Urn 2, and then draw a marble at
random from Urn 2. What is the probability that that second marble is blue?
This quantity is easy to find analytically, but we'll use simulation. Here
is the straightforward way:

\begin{Verbatim}[fontsize=\relsize{-2}]
sim3 <- function(nreps)  {
nb1 = 10  # 10 blue marbles in Urn 1
n1 <- 18  # number of marbles in Urn 1 at 1st pick
n2 <- 13  # number of marbles in Urn 2 at 2nd pick
count <- 0
for (i in 1:nreps)  {
   nb2 = 6  # 6 blue marbles orig. in Urn 2
   # pick from Urn 1 and put in Urn 2
   if (runif(1) < nb1/n1) nb2 <- nb2 + 1
   # pick from Urn 2
   if (runif(1) < nb2/n2) count <- count + 1
}
return(count/nreps)  # est. P(pick blue from Urn 2)
}
\end{Verbatim}

But here is how we can do it without loops:

\begin{Verbatim}[fontsize=\relsize{-2}]
sim4 <- function(nreps)  {
nb1 = 10  # 10 blue marbles in Urn 1
nb2 = 6  # 6 blue marbles orig. in Urn 2
n1 <- 18  # number of marbles in Urn 1 at 1st pick
n2 <- 13  # number of marbles in Urn 2 at 2nd pick
u <- matrix(c(runif(2*nreps)),nrow=nreps,ncol=2)
simfun <- function(rw,nb1,n1,nb2,ny2,n2) {
   if (rw[1] < nb1/n1) nb2 <- nb2 + 1
   if (rw[2] < nb2/n2) b <- 1 else b <- 0
   return(b)
}
z <- apply(u,1,simfun,nb1,n1,nb2,ny2,n2)
return(mean(z))  # est. P(pick blue from Urn 2)
}
\end{Verbatim}

Here we have set up a matrix {\bf u} with two columns of U(0,1) random variates.
The first column is used for our simulation of drawing from Urn 1, and the
second  for the drawing from Urn 2. Our function {\bf simfun()} works on one
repetition of the experiment. We have set up the call to {\bf apply()} to go
through all of the {\bf nreps} repetitions.

Actually, on my machine, this second approach was actually slower. So,
one must not assume that using {\bf apply()} will necessarily speed
things up. Note, though, that in a parallel version of R (Section
\ref{parr}, we'd likely get quite a speedup.

\subsection{Built-In Random Variate Generators}

R has functions to generate variates from a number of different
distributions.  For example, {\bf rbinom()} generates binomial or
Bernoulli random variates.\footnote{A sequence of independent 0-1 valued
random variables with the same probability of 1 for each is called {\bf
Bernoulli}.}  If we wanted to, say, find the probability of getting at
least 4 heads out of 5 tosses of a coin, we could do this:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- rbinom(100000,5,0.5)
> length(x[x >= 4])/100000
[1] 0.18791
\end{Verbatim}

There are also {\bf rnorm()} for the normal distribution, {\bf rexp()} for
the exponential, {\bf runif()} for the uniform, {\bf rgamma()} for the
gamma, {\bf rpois()} for the Poisson and so on.

\subsubsection{Obtaining the Same Random Stream in Repeated Runs}

By default, R will generate a different random number stream from run to
run of a program.  If you want the same stream each time, call {\bf
set.seed()}, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> set.seed(8888)  # or your favorite number as an argument
\end{Verbatim}

\section{Input/Output}

\subsection{Reading from the Keyboard}

You can use {\bf scan()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- scan()
1: 12 5
3: 2
4:
Read 3 items
> z
[1] 12  5  2
\end{Verbatim}

Use {\bf readline()} to input a line from the keyboard as a string:

\begin{Verbatim}[fontsize=\relsize{-2}]
> w <- readline()
abc de f
> w
[1] "abc de f"
\end{Verbatim}


\subsection{Printing to the Screen}

In interactive mode, one can print the value of a variable or expression by
simply typing the variable name or expression. In batch mode, one can use
the {\bf print()} function, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
print(x)
\end{Verbatim}

The argument may be an object.

It's a little better to use {\bf cat()} instead of {\bf print()}, as the
latter can print only one expression and its output is numbered, which
may be a nuisance to us. E.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> print("abc")
[1] "abc"
> cat("abc\n")
abc
\end{Verbatim}

The arguments to {\bf cat()} will be printed out with intervening
spaces, for instance

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- 12
> cat(x,"abc","de\n")
12 abc de
\end{Verbatim}

If you don't want the spaces, use separate calls to {\bf cat()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- function(a,b) {
+    cat(a)
+    cat(b,"\n")
+ }
> z("abc","de")
abcde 
\end{Verbatim}

\subsection{Reading a Matrix or Data Frame From a File}
\label{readmatdf}

The function {\bf read.table()} was discussed in Section \ref{second}.
Here is a bit more on it.

\begin{itemize}

\item The default value of {\bf header} is FALSE, so if we don't have a
header, we need not say so.

\item By default, character strings are treated as R {\bf factors}
(Section \ref{factors}).  To turn this ``feature'' off, include the
argument {\bf as.is=T} in your call to {\bf read.table()}.

\item If you have a spreadsheet export file, i.e. of type {\bf .csv} in
which the fields are separated by commas instead of spaces, use {\bf
read.csv()} instead of {\bf read.table()}.  There is also {\bf read.xls}
to read core spreadsheet files.

\item Note that if you read in a matrix via {\bf read.table()}, the
resulting object will be a data frame, even if all the entries are
numeric.  You may need it as a matrix, in which case do a followup call
to {\bf as.matrix()}.

\end{itemize}

There appears to be no good way of reading in a matrix from a file.  One
can use {\bf read.table()} and then convert.  A simpler way is to use
{\bf scan()} to read in the matrix row by row, making sure to use the
{\bf byrow} option in the function {\bf matrix()}.  For instance, say
the matrix {\bf x} is

\begin{Verbatim}[fontsize=\relsize{-2}]
1 0 1
1 1 1
1 1 0
1 1 0
0 0 1
\end{Verbatim}

We can read it into a matrix this way:

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- matrix(scan("x"),nrow=5,byrow=T)
\end{Verbatim}

\subsection{Reading a File One Line at a Time}
\label{readlines} 

You can use {\bf readLines()} for this.  We need to create a {\it
connection} first, by calling {\bf file()} .

For example, suppose we have a file {\bf z}, with contents

\begin{Verbatim}[fontsize=\relsize{-2}]
1 3
1 4
2 6
\end{Verbatim}

Then we can do this:

\begin{Verbatim}[fontsize=\relsize{-2}]
> c <- file("z","r")
> readLines(c,n=1)
[1] "1 3"
> readLines(c,n=1)
[1] "1 4"
> readLines(c,n=1)
[1] "2 6"
\end{Verbatim}

If {\bf readLines()} encounters the end of the file, it returns a null
string.

\subsection{Writing to a File}

\subsubsection{Writing a Table to a File}

The function {\bf write.table()} works very much like {\bf read.table()},
in this case writing a data frame instead of reading one.

In the case of writing a matrix, to a file, just state that you want no
row or column names, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> write.table(xc,"xcnew",row.names=F,col.names=F)
\end{Verbatim}

\subsubsection{Writing to a Text File Using cat()}

(The point of the word {\it text} in the title of this section is that,
for instance, the number 12 will be written as the ASCII characters `1'
and `2', as with {\bf printf()} with \%d format in C---as opposed to the
bits 000...00001100.)

The function {\bf cat()} can be used to write to a file, one part at a
time.  For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> cat("abc\n",file="u")
> cat("de\n",file="u",append=T)
\end{Verbatim}

The file is saved after each operation, so at this point the file on
disk really does look like

\begin{Verbatim}[fontsize=\relsize{-2}]
abc
de
\end{Verbatim}

One can write multiple fields.  For instance

\begin{Verbatim}[fontsize=\relsize{-2}]
> cat(file="v",1,2,"xyz\n")
\end{Verbatim}

would produce a file {\bf v} consisting of a single line,

\begin{Verbatim}[fontsize=\relsize{-2}]
1 2 xyz
\end{Verbatim}

\subsubsection{Writing a List to a File}

You have various options here.  One would be to convert the list to a
data frame, as in Section \ref{convert}, and then call {\bf write.table()}.  

\subsubsection{Writing to a File One Line at a Time}

Use {\bf writeLines()}.  See Section \ref{readlines}.

\subsection{Directories, Access Permissions, Etc.} 

R has a variety of functions for dealing with directories, file access
permissions and the like.

Here is a short example.  Say our current working directory contains
files {\bf x} and {\bf y}, as well as a subdirectory {\bf z}.  Suppose
the contents of {\bf x} is

\begin{Verbatim}[fontsize=\relsize{-2}]
12
5
13
\end{Verbatim}

and {\bf y} contains

\begin{Verbatim}[fontsize=\relsize{-2}]
3
4
5
\end{Verbatim}

The following code sums up all the numbers in the non-directory files
here:

\begin{Verbatim}[fontsize=\relsize{-2}]
tot <- 0
ndatafiles <- 0
for (d in dir()) {
   if (!file.info(d)$isdir) {
      ndatafiles <- ndatafiles + 1
      x <- scan(d,quiet=T)
      tot <- tot + sum(x)
   }
}
cat("there were",ndatafiles,"nondirectory files, with a sum of",tot,"\n")
\end{Verbatim}

Type

\begin{Verbatim}[fontsize=\relsize{-2}]
> ?files
\end{Verbatim}

to get more information on permissions etc.

The functions {\bf getwd()} and {\bf setwd()} can be used to determine
or change the current working directory.

The ``..'' notation for parent directory in Linux systems does work on
them.

\subsection{Accessing Files on Remote Machines Via URLs}

Functions such as {\bf read.table()}, {\bf scan()} and so on accept file
names as arguments.  For example, I placed a file {\bf z} on my Web
page, with the contents

\begin{Verbatim}[fontsize=\relsize{-2}]
1 2
3 4
\end{Verbatim}

and accessed it from within R running on my home machine:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- read.table("http://heather.cs.ucdavis.edu/~matloff/z")
> z
  V1 V2
  1  1  2
  2  3  4
\end{Verbatim}

You can also read the file one line at a time, as in Section
\ref{readlines}.

\section{Object Oriented Programming} 
\label{oop}

\subsection{Managing Your Objects}

\subsubsection{Listing Your Objects with the ls() Function}

The {\bf ls()} command will list all of your current objects.  

A useful named argument is {\bf pattern}, which enables wild cards.  For
example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> ls()
 [1] "acc"       "acc05"     "binomci"   "cmeans"    "divorg"    "dv"
 [7] "fit"       "g"         "genxc"     "genxnt"    "j"         "lo"
[13] "out1"      "out1.100"  "out1.25"   "out1.50"   "out1.75"   "out2"
[19] "out2.100"  "out2.25"   "out2.50"   "out2.75"   "par.set"   "prpdf"
[25] "ratbootci" "simonn"    "vecprod"   "x"         "zout"
"zout.100"
[31] "zout.125"  "zout3"     "zout5"     "zout.50"   "zout.75"
> ls(pattern="ut")
 [1] "out1"     "out1.100" "out1.25"  "out1.50"  "out1.75"  "out2"
 [7] "out2.100" "out2.25"  "out2.50"  "out2.75"  "zout"     "zout.100"
[13] "zout.125" "zout3"    "zout5"    "zout.50"  "zout.75"
\end{Verbatim}

\subsubsection{Removing Specified Objects with the rm() Function} 

To remove objects you no longer need, use {\bf rm()}. For instance,

\begin{Verbatim}[fontsize=\relsize{-2}]
> rm(a,b,x,y,z,uuu)
\end{Verbatim}

would remove the objects {\bf a}, {\bf b} and so on.

One of the named arguments of {\bf rm()} is {\bf list}, which makes it easier
to remove multiple objects. For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
> rm(list = ls())
\end{Verbatim}

would assign all of your objects to list, thus removing everything.  If
you make use of {\bf ls()}'s {\bf pattern} argument this tool becomes
even more powerful.

\subsubsection{Saving a Collection of Objects with the save() Function}

Calling {\bf save()} on a collection of objects will write them to disk
for later retrieval by {\bf load()}.

\subsubsection{Listing the Characteristics of an Object with the names(), 
attributes() and class() Functions}

An object consists of a gathering of various kinds of information, with
each kind being called an {\it attribute}. The {\bf names()} function will
tell us the names of the attributes of the given object. For a data
frame, for example, these will be the names of the columns. For a
regression object, these will be {\bf coefficients}, {\bf residuals} and
so on.  Calling the {\bf attributes()} function will give you all this,
plus the class of the object itself.  To just get the class, call {\bf
class()}.

\subsubsection{The exists() Function}

The function {\bf exists()} returns TRUE or FALSE, depending on whether
the argument exists.  Be sure to quote the argument, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> exists("acc")
[1] TRUE
\end{Verbatim}

shows that the object {\bf acc} exists.

\subsubsection{Accessing an Object Via Strings}

The call {\bf get("u")} will return the object {\bf u}.  An example
appears on page \pageref{getex}. 

\subsection{Generic Functions}
\label{genericftns}

As mentioned in my introduction, R is rather polymorphic, in the sense
that the same function can have different operation for different classes.
One can apply {\bf plot()}, for example, to many types of objects,
getting an appropriate plot for each. The same is true for {\bf print()}
and {\bf summary()}.

In this manner, we get a uniform interface to different classes. So, when
someone develops a new R class for others to use, we can try to apply, say,
{\bf summary()} and reasonably expect it to work. This of course means that the
person  who  wrote  the class, knowing the R idiom, would have had the
foresight of writing such a function in the class, knowing that people would
expect one.

The functions above are known as {\it generic functions}.  The actual
function executed will be determined by the class of the object on which
you are calling the function.  

For example, let's look at a simple regression analysis (details in
Section \ref{regress}):

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,3)
> y <- c(1,3,8)
> lmout <- lm(y ~ x)
> lmout

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x
       -3.0          3.5
\end{Verbatim}

Note that we printed out the object {\bf lmout}.  (Remember, by simply typing
the name of an object in interactive mode, the object is printed.)  What
happened then was that the R interpreter saw that {\bf lmout} was an
object of class {\bf lm} (the quotation marks are part of the class
name), and thus instead of calling {\bf print()}, it called {\bf
print.lm()}, a special print method in the {\bf lm} class.  

In fact, we can take a look at that method:

\begin{Verbatim}[fontsize=\relsize{-2}]
> print.lm
function (x, digits = max(3, getOption("digits") - 3), ...)
{
    cat("\nCall:\n", deparse(x$call), "\n\n", sep = "")
    if (length(coef(x))) {
        cat("Coefficients:\n")
        print.default(format(coef(x), digits = digits), print.gap = 2,
            quote = FALSE)
    }
    else cat("No coefficients\n")
    cat("\n")
    invisible(x)
}
<environment: namespace:stats>
\end{Verbatim}

Don't worry about the details here; our main point is that the printing
was dependent on context, with a different print function being called
for each different class.

You can see all the implementations of a given generic method by
calling {\bf methods()}, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> methods(print)
  [1] print.acf*                         print.anova
  [3] print.aov*                         print.aovlist*
  [5] print.ar*                          print.Arima*
  [7] print.arima0*                      print.AsIs
  [9] print.Bibtex*                      print.by
...
\end{Verbatim}

You can see all the generic methods this way:

\begin{Verbatim}[fontsize=\relsize{-2}]
> methods(class="default")
...
\end{Verbatim}

\subsection{Writing Classes}

A class is named via a quoted string:

\begin{Verbatim}[fontsize=\relsize{-2}]
> class(3)
[1] "numeric"
> class(list(3,TRUE))
[1] "list"
> lmout <- lm(y ~ x)
> class(lmout)
[1] "lm"
\end{Verbatim}

If a class is derived from a parent class, the named of the derived
class will be a vector consisting of two strings, first one for the
derived class and then one for the parent.  

Methods are implemented as generic functions.  The name of a method is
formed by concatening the function name with a period and the class
name, e.g. {\bf print.lm()}.

The class of an object is stored in its {\bf ``class''} attribute (the
quotation marks are necessary).

\subsubsection{Old-Style Classes}

Older versions of R used a cobbled-together structure for classes,
referred to as S3.  Under this approach, a class instance is created by
forming a list, with the elements of the list being the member variables
of the class.  (Readers who know Perl may recognize this {\it ad hoc}
nature in Perl's own OOP system.)  The {\bf "class"} attribute is set by
hand by using the {\bf attr()} or {\bf class()} function, and then
various generic functions are defined.

For instance, continuing our employee example from Section \ref{list},
we could write

\begin{Verbatim}[fontsize=\relsize{-2}]
> j <- list(name="Joe", salary=55000, union=T)
> class(j) <- "employee"
> attributes(j)  # let's check 
$names
[1] "name"   "salary" "union"

$class
[1] "employee"
\end{Verbatim}

Now write a generic function:

\begin{Verbatim}[fontsize=\relsize{-2}]
print.employee <- function(wrkr) {
   cat(wrkr$name,"\n")
   cat("salary",wrkr$salary,"\n")
   cat("union member",wrkr$union,"\n")
}
\end{Verbatim}

Now test it:

\begin{Verbatim}[fontsize=\relsize{-2}]
> j
Joe
salary 55000 
union member TRUE 
\end{Verbatim}

Compare this to the call to the default {\bf print()} back in Section
\ref{list}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> j
$name
[1] "Joe"

$salary
[1] 55000

$union
[1] TRUE

> j[[1]]
[1] "Joe"
\end{Verbatim}

Here is a more involved example.  Here we will write an R class {\bf
``ut"} for upper-triangular matrices.  Recall that this means that these
are square matrices whose elements below the diagonal are 0s.  For
example:

$$
\begin{pmatrix}
1 & 5 & 12 \\
0 & 6 & 9 \\
0 & 0 & 2
\end{pmatrix}
$$

The component {\bf mat} of this class will store the matrix.  There is
 no point in storing the 0s, so only the diagonal and above-diagonal
elements will be stored, in column-major order.  We could initialize
storage for the above matrix, for instance, via the call c(1,5,6,12,9,2).
The component {\bf ix} of this class shows where in {\bf mat} the
various columns begin.  For the above case, {\bf ix} would be
c(1,2,4), meaning that column 1 begins at {\bf mat[1]}, column 2 begins
at {\bf mat[2]} and column 3 begins at {\bf mat[4]}

The function below creates an instance of this class.  Its argument {\bf
inmat} is in full matrix format, i.e. including the 0s.  

\begin{Verbatim}[fontsize=\relsize{-2}]
ut <- function(inmat) {
   nr <- nrow(inmat)
   rtrn <- list()
   class(rtrn) <- "ut"
   rtrn$mat <- vector(length=sum1toi(nr))
   rtrn$ix <- sum1toi(0:(nr-1)) + 1
   for (i in 1:nr) {
      ixi <- rtrn$ix[i]
      # copy the i-th column of inmat to mat
      rtrn$mat[ixi:(ixi+i-1)] <- inmat[1:i,i]
   }
   return(rtrn)
}

# returns 1+...+i
sum1toi <- function(i) return(i*(i+1)/2)
\end{Verbatim}
            

Much of the R library still uses the S3 approach, but in the next
section we will describe the newer system, S4.

\subsubsection{New-Style Classes}

Here one creates the class by calling {\bf setClass()}.  Continuing our
employee example, we could write

\begin{Verbatim}[fontsize=\relsize{-2}]
> setClass("employee",
+    representation(
+       name="character",
+       salary="numeric",
+       union="logical")
+ )
[1] "employee"
\end{Verbatim}

Now, let's create an instance of this class, for Joe, using {\bf new()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> joe <- new("employee",name="Joe",salary=55000,union=T)
> joe
An object of class “employee”
Slot "name":
[1] "Joe"

Slot "salary":
[1] 55000

Slot "union":
[1] TRUE
\end{Verbatim}

Note that the member variables are called {\it slots}.  We reference
them via the @ symbol, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> joe@salary
[1] 55000
\end{Verbatim}

The {\bf slot()} function can also be used.

To define a generic function on a class, use {\bf setMethod()}.  Let's
do that for our class {\bf "employee"} here.  We'll implement the {\bf
show()} function.  To see what this function does, consider our command
above,

\begin{Verbatim}[fontsize=\relsize{-2}]
> joe
\end{Verbatim}

As we know, in R, when we type the name of a variable while in
interactive mode, the value of the variable is printed out:

\begin{Verbatim}[fontsize=\relsize{-2}]
> joe
An object of class “employee”
Slot "name":
[1] "Joe"

Slot "salary":
[1] 55000

Slot "union":
[1] TRUE
\end{Verbatim}

The action here is that {\bf show()} is called.\footnote{The function
{\bf show()} has precedence over {\bf print()}.}  In fact, we would get
the same output here by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> show(joe)
\end{Verbatim}

Let's override that, with the following code (which is in a separate
file, and brought in using {\bf source()}):

\begin{Verbatim}[fontsize=\relsize{-2}]
setMethod("show", "employee",
   function(object) {
      inorout <- ifelse(object@union,"is","is not")
      cat(object@name,"has a salary of",object@salary,
         "and",inorout, "in the union", "\n")
   }
)
\end{Verbatim}

The first argument gives the name of the generic function which we will
override, with the second argument giving the class name.  We then
define the new function.

Let's try it out:

\begin{Verbatim}[fontsize=\relsize{-2}]
> joe
Joe has a salary of 55000 and is in the union
\end{Verbatim}

\section{Type Conversions}

The {\bf str()} function converts an object to string form, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,4)
> class(x)
[1] "numeric"
> str(x)
 num [1:3] 1 2 4
\end{Verbatim}

There is a generic function {\bf as()} which does conversions, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,4)
> y <- as.character(x)
> y
[1] "1" "2" "4"
> as.numeric(y)
[1] 1 2 4
> q <- as.list(x)
> q
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 4

> r <- as.numeric(q)
> r
[1] 1 2 4
\end{Verbatim}

You can see all of this family by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> methods(as)
\end{Verbatim}

The {\bf unclass()} function converts a class object to an ordinary
list.

\section{Stopping Execution}

You can write your code to check for various anomalies specific to your
application, and if one is found, call {\bf stop()} to stop execution.

\section{Functions for Statistical Distributions}

R  has  functions  available for various aspects of most of the famous
statistical distributions. Prefix the name by d for the density, p for the
cdf, q for quantiles and r for simulation. The suffix of the name
indicates the distribution, such as {\it norm}, {\it unif}, {\it chisq},
{\it binom}, {\it exp}, etc.

For example for the chi-square distribution:

\begin{Verbatim}[fontsize=\relsize{-2}]
> mean(rchisq(1000,different=2))  find mean of 1000 chi-square(2) variates
[1] 1.938179
> qchisq(0.95,1)  find 95th percentile of chi-square(2)
[1] 3.841459
\end{Verbatim}

An example of the use of {\bf rnorm()}, to generate random
normally-distributed variates, is in Section \ref{simulation}, as well
as one for {\bf rbinon()} for binomial/Bernoulli random variates.  The
function {\bf dnorm()} gives the normal density, {\bf pnorm()} gives
the normal CDF, and {\bf qnorm()} gives the normal quantiles. 

The d-series, for density, gives the probability mass function in the case
of discrete distributions.  The first argument is a vector indicating at
which points we wish to find the values of the pmf.  For instance, here
is how we would find the probabilities of 0, 1 or 2 heads in 3 tosses of
a coin:

\begin{Verbatim}[fontsize=\relsize{-2}]
> dbinom(0:2,3,0.5)
[1] 0.125 0.375 0.375
\end{Verbatim}

See the online help pages for details, e.g. by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> help(pnorm)
\end{Verbatim}

\section{Math Functions}

The usual {\bf exp()}, {\bf log()}, {\bf log10()}, {\bf sqrt()}, {\bf
abs()} etc. are available, as well as {\bf min()}, {\bf which.min()}
(returns the index for the smallest element ), {\bf max()}, {\bf
which.max()}, {\bf pmin()}, {\bf pmax()}, {\bf sum()}, {\bf prod()} (for
products of multiple factors), {\bf round()}, {\bf floor()}, {\bf
ceiling()}, {\bf sort()} etc.  The function {\bf factorial()} computes
its namesake, so that for instance {\bf factorial(3)} is 6.

Note that the function {\bf min()} returns a scalar even when applied to
a vector.  By contrast, if {\bf pmin()} is applied to vectors, it
returns a vector of the elementwise minima.  For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z
     [,1] [,2]
[1,]    1    2
[2,]    5    3
[3,]    6    2
> min(z[,1],z[,2])
[1] 1
> pmin(z[,1],z[,2])
[1] 1 3 2
\end{Verbatim}

Also, some special math functions, described when you invoke {\bf
help()} with the argument Arithmetic.

The function {\bf combn{}{}} generates combinations:

\begin{Verbatim}[fontsize=\relsize{-2}] 
> c32 <- combn(1:,3,2)
> c32
     [,1] [,2] [,3]
[1,]    1    1    2
[2,]    2    3    3
> class(c32)
[1] "matrix"
\end{Verbatim}

The function also allows the user to specify a function to be called by
{\bf combn()} on each combination.

Function minimization/maximization can be done via {\bf nlm()} and {\bf
optim()}.

R also has some calculus capabilities, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> D(expression(exp(x^2)),"x")  # derivative
exp(x^2) * (2 * x)
> integrate(function(x) x^2,0,1)
0.3333333 with absolute error < 3.7e-15
\end{Verbatim}

There are R packages such as {\bf odesolve} for differential equations,
{\bf ryacas} to interface R with the Yacas symbolic math system, and so
on.

\section{String Manipulation}
\label{string}

R has a number of string manipulation utilities, such as 

\begin{itemize}

\item {\bf grep()}:  Searches for a substring, like the Linux command of
the same name.

\item {\bf nchar()}:  Finds the length of a string.

\item {\bf paste()}:  Assembles a string from parts.

\item {\bf sprintf()}:  Assembles a string from parts.

\item {\bf substr()}:  Extracts a substring.

\item {\bf strsplit()}:  Splits a string into substrings.

\end{itemize}

\subsection{Example of nchar(), substr():  Testing for a Specified
Suffix in a File Name}

\begin{Verbatim}[fontsize=\relsize{-2}]
# tests whether the file name fn has the suffix suff, 
# e.g. "abc" in "x.abc"
testsuffix <- function(fn,suff) {
   ncf <- nchar(fn)  # nchar() gives the string length
   dotpos <- ncf - nchar(suff) + 1  # dot would start here if there is one
   # now check that suff is at the end of the name
   return(substr(fn,dotpos,ncf)==suff)
}
\end{Verbatim}

\subsubsection{Example of paste(), sprintf():  Forming File Names}

Suppose I wish to create five files, {\bf q1.pdf} through {\bf q5.pdf}
consisting of histograms of 100 random N(0,$i^2$) variates.  I could
execute the code\footnote{The main point here is the string
manipulation, creating the file names {\bf fname}.  Don't worry about
the graphics operations, or check Section \ref{savegraph} for details.}

\begin{Verbatim}[fontsize=\relsize{-2}]
for (i in 1:5)  {
   fname <- paste("q",i,".pdf")
   pdf(fname)
   hist(rnorm(100,sd=i))
   dev.off()
}
\end{Verbatim}

The {\bf paste()} function concatenates the string "q" with the string
form of {\bf i}  For example, when i = 2, the variable {\bf fname} will
be ``q 2''.

But that wouldn't quite work, as it would give me filenames like "q
2.pdf".  On Linux systems, filenames with embedded spaces create
headaches.

One solution would be to use the {\bf sep} argument:

\begin{Verbatim}[fontsize=\relsize{-2}]
for (i in 1:5)  {
   fname <- paste("q",i,".pdf",sep="")
   pdf(fname)
   hist(rnorm(100,sd=i))
   dev.off()
}
\end{Verbatim}

Here we used an empty string for the separator.

Or, we could use a function borrowed from C:

\begin{Verbatim}[fontsize=\relsize{-2}]
for (i in 1:5)  {
   fname <- sprintf("q%d.pdf",i)
   pdf(fname)
   hist(rnorm(100,sd=i))
   dev.off()
}
\end{Verbatim}

Since even many C programmers are unaware of the {\bf sprintf()}
function, some explanation is needed. This function works just like {\bf
printf()}, except that it ``prints" to a string, not to the screen. Here
we are ``printing" to the string {\bf fname}. What are we printing? The
function says to first print ``q", then print the character version of
{\bf i}, then print ``.pdf". When {\bf i} = 2, for instance, we print
"z2.pdf" to {\bf fname}.

For floating-point quantities, note also the difference between \%f and \%g
formats:

\begin{Verbatim}[fontsize=\relsize{-2}]
> sprintf("abc%fdef",1.5)
[1] "abc1.500000def"
> sprintf("abc%gdef",1.5)
[1] "abc1.5def"
\end{Verbatim}

\subsubsection{Example of grep()}

Note that {\bf grep()} searches for a given string in a vector of
strings, returning the indices of the strings in which the pattern is
found.  That makes it perfect for row selection, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> d <- data.frame(cbind(c(0,5,12,13),x))
> d
  V1    x
1  0  xyz
2  5 yabc
3 12  abc
4 13 <NA>
> dabc <- d[grep("ab",d[,2]),]
> dabc
  V1    x
2  5 yabc
3 12  abc
\end{Verbatim}

\subsubsection{Example of strsplit()}

\begin{Verbatim}[fontsize=\relsize{-2}]
> s <- strsplit("a b"," ")
> s
[[1]]
[1] "a" "b"

> s[1]
[[1]]
[1] "a" "b"

> s[[1]]
[1] "a" "b"
> s[[1]][1]
[1] "a"
\end{Verbatim} 

\section{Sorting}

Ordinary numerical sorting of a vector can be done via {\bf sort()}.  

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(13,5,12,5)
> sort(x)
[1]  5  5 12 13
\end{Verbatim}

If one wants the inverse, use {\bf order()}.  For example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> order(x)
[1] 2 4 3 1
\end{Verbatim}

Here is what {\bf order()}'s output means:  The 2 means that {\bf x[2]}
is the smallest in {\bf x}; the 4 means that {\bf x[4]} is the
second-smallest, etc.

You can use {\bf order()}, together with indexing, to sort data frames.
For instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> y <- read.table("y")
> y
    V1 V2
1  def  2
2   ab  5
3 zzzz  1
> r <- order(y$V2)
> r
[1] 3 1 2
> z <- y[r,]
> z
    V1 V2
3 zzzz  1
1  def  2
2   ab  5
\end{Verbatim}

What happened here?  We called {\bf order()} on the second column of
{\bf y}, yielding a vector telling us which numbers from that column
should go before which if we were to sort them.  The 3 in this vector
tells us that {\bf x[3,2]} is the smallest number; the 1 tells us that
{\bf x[1,2]} is the second-smallest; and the 2 tells us that {\bf
x[2,2]} is the third-smallest.

We then used indexing (Section \ref{frameindex}) to produce the frame
sorted by column 2, storing it in {\bf z}.

\section{Graphics}
\label{graphics}

{\bf R has a very rich set of graphics facilities}. The top-level R home
page, \url{http://www.r-project.org/}, has some colorful examples, and
there is a very nice display of examples in the R Graph Gallery,
\url{http://addictedtor.free.fr/graphiques}.  

I cannot cover even a small part of that material here, but will give
you enough foundation to work the basics and learn more.

\subsection{The Workhorse of R Graphics, the plot() Function}

This is the workhorse function for graphing, serving as the vehicle for
producing many different kinds of graphs. 

As mentioned earlier, it senses from the type of the object sent to it
what type of graph to make, i.e.  {\bf plot()}, is a {\it generic
function} (Section \ref{genericftns}).  It is really a placeholder for a
family of functions.  The function that actually gets called will depend
on the class of the object on which it is called.

Let's see what happens when we call {\bf plot()} with an X vector and a
Y vector, which are interpreted as a set of pairs in the (X,Y) plane.
For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
> plot(c(1,2,3), c(1,2,4))
\end{Verbatim}

will cause a window to pop up, plotting the points (1,1), (2,2) and
(3,4).  (Here and below, we will not show the actual plots.)

The points in the graph will be symbolized by empty circles.  If you
want a different character type, specify a value for the named argument
{\bf pch} (``point character'').  You can change the size of the
character via the named argument {\bf cex}; see Section \ref{cex}.

As noted in Section \ref{building}, one typically builds a graph, by
adding more and more to it in a succession of several commands.  So, as
a base, we might first draw an empty graph, with only axes.  For
instance,

\begin{Verbatim}[fontsize=\relsize{-2}]
> plot(c(-3,3), c(-1,5), type = "n", xlab="x", ylab="y")
\end{Verbatim}

draws axes labeled ``x'' and ``y'', the horizontal one
ranging from x = -3 to x = 3, and the vertical one ranging from y = -1
to y = 5.  The argument {\bf type="n"} means that there is nothing in
the graph itself.

\subsection{Plotting Multiple Curves on the Same Graph}
\label{building}

The {\bf plot()} function works in stages, i.e. you can build up a graph
in stages by issuing more and more commands, each of which adds to the
graph.  For instance, consider the following: 

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,3)
> y <- c(1,3,8)
> plot(x,y)
> lmout <- lm(y ~ x)
> abline(lmout)
\end{Verbatim}

The call to {\bf plot()} will graph the three points as in our example
above.  At this point the graph will simply show the three points, along
with the X and Y axes with hash marks.

The call to {\bf abline()} then adds a line to the current graph.  Now,
which line is this?  As we know from Section \ref{regress}, the result
of the call to the linear regression function {\bf lm()} is a class
instance containing the slope and intercept of the fitted line, as well
as various other quantities that won't concern us here.  We've assigned
that class instance to {\bf lmout}.  The slope and intercept will now be
in {\bf lmout\$coefficients}.

Now, what happens when we call {\bf abline()}?  This is simply a
function that draws a straight line, with the function's arguments being
treated as the intercept and slope of the line.  For instance, the call
{\bf abline(c(2,1))} would draw the line

$$
y = 1 \cdot x + 2
$$

on whatever graph we've built up so far.

But actually, even {\bf abline()} is a generic function, and since we
are invoking it on the output of {\bf lm()}, this version of the
function knows that the slope and intercept it needs will be in {\bf
lmout\$coefficients}, and it plots that line.  Note again that it
superimposes this line onto the current graph---the one which currently
graphs the three points.  In other words, the new graph will show both
the points and the line.

\subsection{Starting a New Graph While Keeping the Old Ones}

Each time you call {\bf plot()} (directly or indirectly), the current
graph window will be replaced by the new one.  If you don't want that to
happen, you can on Linux systems call {\bf X11()}.  There are
similar calls for other platforms.

\subsection{The lines() Function}

Though there are many options, the two basic arguments to {\bf lines()} are a
vector of X values and a vector of Y values. These are interpreted as (X,Y)
pairs representing points to be added to the current graph, with lines
connecting the points.

For instance, if x and y are the vectors (1.5,2.5) and (3,), then the call

\begin{Verbatim}[fontsize=\relsize{-2}]
> lines(c(1.5,2.5),c(3,3))
\end{Verbatim}

would add a line from (1.5,3) to (2.5,3) to the present graph.

If  you  want  the lines ``connecting the dots" but don't want the dots
themselves, include {\bf type="l"} in your call to {\bf lines()}, or to
{\bf plot()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> plot(x,y,type="l")
\end{Verbatim}

\subsection{Another Example}

Let's plot nonparametric density estimates (these are basically smoothed
histograms) for Exams 1 and 2 from our {\bf exams} file in Section
\ref{second} in the same graph. We use the function {\bf density()} to
generate the estimates. Here are the commands we issue:

\begin{Verbatim}[fontsize=\relsize{-2}]
> d1 = density(testscores$Exam1,from=0,to=100)
> d2 = density(testscores$Exam2,from=0,to=100)
> plot(d2,main="",xlab="")
> lines(d1)
\end{Verbatim}

Here's what we did: First, we computed nonparametric density estimates
from the two variables, saving them in objects {\bf d1} and {\bf d2} for
later use. We then called {\bf plot()} to draw the curve for Exam2. The
internal structure of {\bf d2} contains vectors of X and Y coordinates
needed by {\bf plot()} to draw the figure.  We then called {\bf lines()}
to add Exam1's curve to the graph.

Note that we asked R to have blank labels for the figure as a whole and
for the X axis; otherwise, R would have gotten such labels from {\bf
d2}, which would have been specific to Exam 2.

The  call to {\bf plot()} both initiates the plot and draws the first curve.
(Without specifying type="l", only the points would have been plotted.) The
call to {\bf lines()} then adds the second curve.

You can use the {\bf lty} parameter in {\bf plot()} to specify the type
of line, e.g solid, dashed, etc. Type

\begin{Verbatim}[fontsize=\relsize{-2}]
> help(par)
\end{Verbatim}

to see the various types and their codes.

\subsection{Adding Points: the points() Function}

The {\bf points()} function adds a set of (x,y)-points, with labels for
each, to the currently displayed graph.  For instance, in our first
example, Section \ref{first}, the command

\begin{Verbatim}[fontsize=\relsize{-2}]
points(testscores$Exam1,testscores$Exam3,pch="+")
\end{Verbatim}

would superimpose onto the current graph the points of the exam scores from
that example, using "+" signs to mark them.

As with most of the other graphics functions, there are lots of options,
e.g. point color, background color, etc.

\subsection{The legend() Function}

A nice function is {\bf legend()}, which is used to add a legend to a
multicurve graph. For instance,

\begin{Verbatim}[fontsize=\relsize{-2}]
> legend(2000,31162,legend="CS",lty=1)
\end{Verbatim}

would place a legend at the point (2000,31162) in the graph, with a little
line of type 1 and label of "CS". Try it!

\subsection{Adding Text: the text() and mtext() Functions}

Use the {\bf text()} function to place some text anywhere in the current graph.
For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
text(2.5,4,"abc")
\end{Verbatim}

would write the text "abc" at the point (2.5,4) in the graph. The center of
the string, in this case "b", would go at that point.

In order to get a certain string placed exactly where you want it, you may
need to engage in some trial and error. R has no "undo" command (though the
ESS interface to R described below does). For that reason, you might want to
put all the commands you're using to build up a graph in a file, and then
use {\bf source()} to execute them. 

But you may find the {\bf locator()} function to be a much quicker way to go.
See Section \ref{locator}.

To add text in the margins, use {\bf mtext()}.

\subsection{Pinpointing Locations: the locator() Function}
\label{locator}

Typing

\begin{Verbatim}[fontsize=\relsize{-2}]
locator(1)
\end{Verbatim}

will tell R that you will click in 1 place in the graph. Once you do so, R
will  tell you the exact coordinates of the point you clicked on. Call
locator(2) to get the locations of 2 places, etc.  (Warning:  Make sure
to include the argument.)

You can combine this, for example, with {\bf text()}, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> text(locator(1),"nv=75")
\end{Verbatim}

\subsection{Changing Character Sizes: the cex() Function}
\label{cex}

The {\bf cex()} ("character expand") function allows you to expand or shrink
characters within a graph, very useful. You can use it as a named parameter
in various graphing functions.

\begin{Verbatim}[fontsize=\relsize{-2}]
text(2.5,4,"abc",cex = 1.5)
\end{Verbatim}

would print the same text as in our earlier example, but with characters 1.5
times normal size.

\subsection{Operations on Axes}

You may wish to have the ranges on the X- and Y-axes of your plot to be
broader or narrower than the default. You can do this by specifying the
{\bf xlim} and/or {\bf ylim} parameters in your call to {\bf plot()} or
{\bf points()}. For example, {\bf ylim=c(0,90000)} would specify a range
on the Y-axis of 0 to 90000.

This is especially useful if you will be displaying several curves in the
same graph. Note that if you do not specify xlim and/or ylim, then draw the
largest curve first, so there is room for all of them.

\subsection{The polygon() Function}

You can use {\bf polygon()} to draw arbitrary polygonal objects, with
shading etc.  For example, the following code draws the graph of the
function $f(x) = 1 - e^{-x}$, then adds a rectangle that approximates
the area under the curve from x = 1.2 to x = 1.4:

\begin{Verbatim}[fontsize=\relsize{-2}]
> f <- function(x) return(1-exp(-x))
> curve(f,0,2)
> polygon(c(1.2,1.4,1.4,1.2),c(0,0,f(1.3),f(1.3)),col="gray")
\end{Verbatim}

In the call to {\bf polygon()} here, the first argument is the set of X
coordinates for the rectangle, while the second argument specifies the Y
coordinates.  The third argument specifies that the rectangle should be
shaded in gray; instead we could have, for instance, used the {\bf
density} argument for striping.

\subsection{Smoothing Points: the lowess() Function}

Just plotting a cloud of points, whether connected or not, may turn out
to be just an uninformative mess. In many cases, it is better to smooth
out the data by fitting a nonparametric regression
estimator---nonparametric meaning that it is not necessarily in the form
of a straigh line---such as {\bf lowess()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
plot(lowess(x,y))
\end{Verbatim}

The call {\bf lowess(x,y)} returns the pairs of points on the regression curve,
and then {\bf plot()} plots them. Of course, we could get both the cloud and the
smoothed curve:

\begin{Verbatim}[fontsize=\relsize{-2}]
plot(x,y)
lines(lowess(x,y))
\end{Verbatim}

\subsection{Graphing Explicit Functions}
\label{explicit}

Say you wanted to plot the function $g(t) = (t^2+1)^{0.5}$ for t between
0 and 5. You could use the following R code:

\begin{Verbatim}[fontsize=\relsize{-2}]
g <- function(t) { return (t^2+1)^0.5 }  # define g()
x <- seq(0,5,length=10000)  # x = [0.0004, 0.0008, 0.0012,..., 5]
y <- g(x)  # y = [g(0.0004), g(0.0008), g(0.0012), ..., g(5)]
plot(x,y,type="l")
\end{Verbatim}

But even better, you could use the {\bf curve()} function:

\begin{Verbatim}[fontsize=\relsize{-2}]
> curve((x^2+1)^0.5,0,5)
\end{Verbatim}

\subsection{Graphical Devices and Saving Graphs to Files}
\label{savegraph}

R has the notion of a graphics device. The default device is the screen. If
we want to have a graph saved to a file, we must set up another device. For
example, if we wish to save as a PDF file, we do something like the
following.  (Warning:  This is actually too tedious an approach, and a
shortcut will be presented later on.  But the reader should go through
this ``long way'' once, to understand the principles.)

\begin{Verbatim}[fontsize=\relsize{-2}]
> pdf("d12.pdf")
\end{Verbatim}

This opens a file, which we have chosen here to call {\bf d12.pdf}. We now have
two devices open, as we can confirm:

\begin{Verbatim}[fontsize=\relsize{-2}]
> dev.list()
X11 pdf
  2   3
\end{Verbatim}

The screen is named X11 when R runs on Linux; it is device number 2 here.
Our PDF file is device number 3.  Our active device is now the PDF file:

\begin{Verbatim}[fontsize=\relsize{-2}]
> dev.cur()
pdf 
  3 
\end{Verbatim}

All graphics output will now go to this file instead of to the
screen. But what if we wish to save what's already on the screen? We could
re-establish the screen as the current device, then copy it to the PDF
device, 3:

\begin{Verbatim}[fontsize=\relsize{-2}]
> dev.set(2)
X11
  2
> dev.copy(which=3)
pdf
  3
\end{Verbatim}

Note carefully that the PDF file is not usable until we close it, which we
do as follows:

\begin{Verbatim}[fontsize=\relsize{-2}]
> dev.set(3)
pdf
  3
> dev.off()
X11
  2
\end{Verbatim}

(We could also close the device by exiting R, though it's probably better to
proactively close.)

The above set of operations to print a graph can become tedious, but
there is a shortcut:

\begin{Verbatim}[fontsize=\relsize{-2}]
> dev.print(device=pdf,"d12.pdf")
X11 
  2 
\end{Verbatim}

This opens the PDF file {\bf d12.pdf}, copies the X11 graph to it,
closes the file, and resets X11 as the active device.

% The above set of operations to print a graph can become tedious if used a
% lot, so it makes sense to put them into a file, say {\bf prigra.r}, so
% as to avoid having to type them by hand all the time in the future:
% 
% \begin{Verbatim}[fontsize=\relsize{-2}]
% # prints the currently displayed graph, on device number dvnum, to the
% # file filename; dvnum will typically be 2; filename must be the name of
% # a PDF file, quoted; it closes the PDF file and restores dvnum as the
% # current device
% 
% prpdf <- function(dvnum, filename)  {
%    pdf(filename)  # output from now on will go to the specified PDF file
%    dvc <- dev.cur()  # determine our active display device
%    dev.set(dvnum)  # set active device to the specified one
%    dev.copy(which=dvc)  # copy
%    dev.set(dvc)
%    dev.off()
%    dev.set(dvnum)
% }
% \end{Verbatim}

\subsection{3-Dimensional Plots}
\label{3d}

There are a number of functions to plot data in three dimensions, such
as {\bf persp()} and {\bf wireframe()}, which draw surfaces, and {\bf
cloud()}, which draws three-dimensional scatter plots.  There are many
more.

For {\bf wireframe()} and {\bf cloud()}, one loads the {\bf lattice}
library.  Here is an example:

\begin{Verbatim}[fontsize=\relsize{-2}]
> a <- 1:10
> b <- 1:15
> eg <- expand.grid(x=a,y=b)
> eg$z <- eg$x^2 + eg$x * eg$y
> wireframe(z ~ x+y, eg)
\end{Verbatim}

The call to {\bf expand.grid()} creates a data frame, consisting of two
columns named {\bf x} and {\bf y}, combining all the values of the two
inputs.  Here {\bf a} and {\bf b} had 10 and 15 values, respectively, so
the resulting data frame will have 150 rows.  

We then added a third column, named {\bf z}, as a function of the first
two columns.  Our call to {\bf wireframe()} then creates the graph.
Note that {\bf z}, {\bf x} and {\bf y} of course refer to names of
columns in {\bf eg}.

All the points would be connected as a surface (like connecting points
by lines in two dimensions).  With {\bf cloud()}, though, the points
would just be isolated.

For {\bf wireframe()}, the (X,Y) pairs must form a rectangular grid,
though not necessarily evenly spaced.

Note that the data frame that is input to {\bf wireframe()} need not
have been created by {\bf expand.grid()}.  

By the way, these functions have many different options.  A nice one for
{\bf wireframe()}, for instance, is {\bf shade=T}, which makes it all
easier to see.

\subsection{The Rest of the Story}

As mentioned, we are not even scratching the surface here.  See one of
the references cited in Section \ref{graphicshelp} to learn more about
R's excellent set of graphics facilities.

\section{The Innocuous c() Function}

As seen before, the concatenation function {\bf c()} does what its name
implies, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> x <- c(1,2,3)
> y <- c(4,5)
> c(x,y)
[1] 1 2 3 4 5
\end{Verbatim}

But when mixing modes, {\bf c()} will apply the precedence discussed in
Section \ref{preced}.  Note carefully too that by default {\bf c()} will
recurse down through data structures, resulting in a ``flatten''
operation.

\section{The Versatile attach() Function}

Abstractly described, the call {\bf attach(x)} loads the namespace in
the {\it database} {\bf x}, making those objects available via those
names.  There are two main contexts for this:

\begin{itemize}

\item The database {\bf x} could be a list.  The result is that, say,
{\bf x\$u} will now be referenceable as simply {\bf \$u}.  

The typical usage of this is for data frames, which you may recall are
in fact lists.  In the example in Section \ref{second}, for instance, 
we could type

\begin{Verbatim}[fontsize=\relsize{-2}]
> attach(testscores)
\end{Verbatim}

This command tells R that from now on, when we refer, for example, to
{\bf Exam3}, we mean {\bf testscores\$Exam3}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> mean(Exam3)
[1] 50.05455
\end{Verbatim}

If we want R to stop doing that, we use {\bf detach()}, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> detach()
> mean(Exam3)
Error in mean(Exam3) : Object "Exam3" not found
\end{Verbatim}

\item If we had previously saved a collection of objects to disk using
{\bf save()}, we can now restore them to our R session, by using {\bf
attach()}.

\end{itemize}

\section{Debugging}

The R base package includes a number of debugging facilities. They are
nowhere near what a good debugging tool offers, but with skillful usage
they can be effective.

A much more functional debugging package is available for R, of course
called {\bf debug}.  I will discuss this in Section \ref{debugpkg}.

\subsection{The debug() Function}

One of the tools R offers for debugging your R code is the built-in
function {\bf debug()}. It works in a manner similar to C debuggers such
as GDB. 

\subsubsection{Setting Breakpoints}

Say for example we suspect that our bug is in the function {\bf f()}. We
enable debugging by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> debug(f)
\end{Verbatim}

This will set a breakpoint at the beginning of {\bf f()}.

To turn off this kind of debugging for a function {\bf f()}, type

\begin{Verbatim}[fontsize=\relsize{-2}]
> undebug(f)
\end{Verbatim}

Note that if you simultaneously have a separate window open in which you
are editing  your source code, and you had executed {\bf debug(f), then
if} you reload using {\bf source()}, the effect is that of calling {\bf
undebug(f)}.

If we wish to set breakpoints at the line level, we insert a line

\begin{Verbatim}[fontsize=\relsize{-2}]
browser()
\end{Verbatim}

before line at which we wish to break.

You can make a breakpoint set in this fashion conditional by placing it
within an {\bf if} statement, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
if (k == 6) browser()
\end{Verbatim}

You may wish to add an argument named, say, {\bf dbg}, to most of your
functions, with {\bf dbg = 1} meaning that you wish to debug that part
of the code.  The above then may look like
\begin{Verbatim}[fontsize=\relsize{-2}]
if (dbg && k == 6) browser()
\end{Verbatim}

\subsubsection{Stepping through Our Code}

When you execute your code and hit a breakpoint, you enter the debugger,
termed the {\it browser} in R.  The command prompt will now be something
like 

\begin{Verbatim}[fontsize=\relsize{-2}]
Browse[1]
\end{Verbatim}

instead of just $>$.  Then you can invoke various debugging operations,
such as:

\begin{itemize}

\item {\bf n or Enter:}
You can single-step through the code by hitting the Enter key.
(If it is a line-level breakpoint, you must hit n the first time, then
Enter after that.)

\item {\bf c:}
You can skip to the end of the ``current context" (a loop or a
function) by typing c.

\item {\bf where:}
You can get a stack report by typing {\tt where}.

\item {\bf Q:}
You can return to the $>$ prompt, i.e. exit the debugger, by typing Q.

\item All normal R operations and functions are still available to you.
So for instance to query the value of a variable, just type its name, as
you would in ordinary interactive usage of R. If the variable's name is
one of the {\bf debug()} commands, though, say c, you'll need to do
something like {\bf print(c)} to print it out.

\end{itemize}

\subsection{Automating Actions with the trace() Function}

The {\bf trace()} function is quite flexible and powerful, though it
takes some initial effort to learn. I will discuss some of the simpler
usage forms here.

The call

\begin{Verbatim}[fontsize=\relsize{-2}]
> trace(f,t)
\end{Verbatim}

would instruct R to call the function {\bf t()} every time we enter the
function {\bf r()}. For instance, say we wish to set a breakpoint at the
beginning of the function {\bf gy()}. We could do this by the command

\begin{Verbatim}[fontsize=\relsize{-2}]
> trace(gy,browser)
\end{Verbatim}

This would have the same effect as placing the command {\bf browser()}
in our source  code  for  {\bf gy()}, but would be quicker and more
convenient than inserting such a line, saving the file and rerunning
{\bf source()} to load in the new version of the file.

It would also be quicker and more convenient to undo, by simply running

\begin{Verbatim}[fontsize=\relsize{-2}]
> untrace(gy)
\end{Verbatim}

You can turn tracing on or off globally by calling {\bf tracingState()},
with the argument TRUE to turn it on, FALSE to turn it off. Recall too
that these boolean constants in R can be abbreviated T and F.

\subsection{Performing Checks After a Crash with the traceback() and
debugger() Functions}

Say your R code crashes when you are not running the debugger. There is
still a debugging tool available to you after the fact: You can do a ``post
mortem" by simply calling {\bf traceback()}.

You can get a lot more if you set R up to dump frames on a crash:

\begin{Verbatim}[fontsize=\relsize{-2}]
> options(error=dump.frames)
\end{Verbatim}

If you've done this, then after a crash run

\begin{Verbatim}[fontsize=\relsize{-2}]
> debugger()
\end{Verbatim}

You will then be presented with a choice of levels of function calls to look
at. For each one that you choose, you can take a look at the values of the
variables there.  After browsing through one level, you can return to
the {\bf debugger()} main menu by hitting n.

\subsection{The debug Package}
\label{debugpkg}

The {\bf debug} package provides a more usable debugging interface than R's
built-in facilities do. It features a pop-up window in which you can watch
your progress as you step through your source code, gives you the ability to
easily set breakpoints, etc.

It requires another package, {\bf mvbutils}, and the Tcl/Tk scripting
and graphics system. The latter is commonly included in Linux
distributions, and is freely  downloadable  for  all  the major
platforms. It suffers from a less-than-perfect display, but is
definitely worthwhile, much better than R's built-in debugging tools.

\subsubsection{Installation}

Choose an installation directory, say {\bf /MyR}. Then 
install {\bf mvbutils} and {\bf debug}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> install.packages("mvbutils","/MyR")
> install.packages("debug","/MyR")
\end{Verbatim}

For R version 2.5.0, I found that a bug in R caused the {\bf debug}
package to fail. I then installed the patched version of 2.5.0, and {\bf
debug} worked fine. On one machine, I encountered a Tcl/Tk problem when
I tried to load {\bf debug}. I fixed that (I was on a Linux system) by
setting the environment variable, in my case by typing 

\begin{Verbatim}[fontsize=\relsize{-2}]
% setenv TCL_LIBRARY /usr/share/tcl8.4
\end{Verbatim}

\subsubsection{Path Issues}

Each time you wish to use {\bf debug}, load it by executing

\begin{Verbatim}[fontsize=\relsize{-2}]
> .libPaths("/MyR")
> library(debug)
\end{Verbatim}

Or, place these in an R startup file, say {\bf .Rprofile} in the
directory in which you want these commands to run automatically.  

Or, create a file {\bf .Renviron} in your home directory,
consisting of the line

\begin{Verbatim}[fontsize=\relsize{-2}]
R_LIBS=~/MyR
\end{Verbatim}

\subsubsection{Usage}

Now you are ready to debug. Here are the main points:

\begin{itemize}

\item Breakpoints are first set at the function level. Say you have a
function {\bf f()} at which you wish to break. Then type

\begin{Verbatim}[fontsize=\relsize{-2}]
> mtrace(f)
\end{Verbatim}

Do this for each function at which you want a breakpoint.

\item Then go ahead and start your program. (I'm assuming that your
program itself consists of a function.) Execution will pause at {\bf
f()}, and a window will pop up, showing the source code for that
function. The current line will be highlighted in green. Back in the R
interactive window, you'll see a prompt D(1)$>$.

\item At this point, you can single-step through your code by repeatedly
hitting the Enter key. You can print the values of variables as you
usually do in R's interactive mode.

\item You can set finer breakpoints, at the line level, using {\bf
bp()}. Once you are  in  {\bf f()}, for instance, to set a breakpoint at
line 12 in that function type

\begin{Verbatim}[fontsize=\relsize{-2}]
D(1)> bp(12)
\end{Verbatim}

\item To set a conditional breakpoint, say at line 12 with the condition
{\bf k == 5}, issue {\bf bp(12,k==5)}.

\item To avoid single-stepping, issue {\bf go()}, which will execute
continuously until the next breakpoint.

\item To set a temporary breakpoint at line n, issue {\bf go(n)}.

\item To restart execution of the function, issue {\bf skip(1)}.

\item If there is an execution error, the offending line will be
highlighted.

\item To cancel all {\bf mtrace()} breaks, issue {\bf mtrace.off()}. To
cancel one for a particular function {\bf f()}, issue {\bf
mtrace(f,tracing=F)}.

\item To cancel a breakpoint, say at line 12, issue {\bf bp(12,F)}.

\item To quit, issue {\bf qqq()}.

\item For more details, see the extensive online help, e.g. by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
D(1)> ?bp
\end{Verbatim}

\end{itemize}

\subsection{Ensuring Consistency with the set.seed() Function}

If you're doing anything with random numbers, you'll need to be able to
reproduce the same stream of numbers each time you run your program during
the debugging session. To do this, type

\begin{Verbatim}[fontsize=\relsize{-2}]
> set.seed(8888)  # or your favorite number as an argument
\end{Verbatim}

\subsection{Syntax and Runtime Errors}

The most common syntax errors will be lack of matching parentheses,
brackets or braces.   When you encounter a syntax error, this is the
first thing you should check and double-check.  I highly recommend that
you use a text editor, say Vim, that does parenthesis matching and
syntax coloring for R.

Beware that often when you get a message saying there is a syntax error
on a certain line, the error may well be elsewhere.  This can occur with
any language, but R seems especially prone to it.

If it just isn't obvious to you where your syntax error is, I recommend
selectively commenting-out some of your code, thus enabling you to
better pinpoint the location of the syntax problem.

If during a run you get a message 

\begin{Verbatim}[fontsize=\relsize{-2}]
could not find function "evaluator"
\end{Verbatim}

and a particular function call is cited, it means that the interpreter
cannot find that function.  You may have forgotten to load a library or
source a code file.

You may sometimes get messages like,

\begin{Verbatim}[fontsize=\relsize{-2}]
There were 50 or more warnings (use warnings() to see the first 50)
\end{Verbatim}

These should be heeded; run {\bf warnings()}, as suggested.  The problem
could range from nonconvergence of an algorithm to misspecification of a
matrix argument to a function.  In many cases, the program output may be
invalid, though it may well be fine too, say with a message ``fitted
probabilities numerically 0 or 1 occurred in: glm...''

\section{Startup Files}

If there are R commands you would like to have executed at the beginning
of every R session, you can place them in a file {\bf .Rprofile} either
in your home directory or in the directory from which you are running R.
The latter directory is searched  for  such  a file first, which allows
you to customize for a particular project.

Other information on startup files is available by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> help(.Rprofile)
\end{Verbatim}

\section{Session Data}
\label{session}

As  you proceed through an interactive R session, R will record the
commands you submit. And as you long as you answer yes to the question
``Save workspace image?" put to you when you quit the session, R will
save all the objects you created in that session, and restore them in
your next session. You thus do  not have to recreate the objects again
from scratch if you wish to continue work from before.

The saved workspace file is named {\bf .Rdata}, and is located either in
the directory from which you invoked this R session (Linux) or in the
R installation directory (Windows).  Note that that means that in
Windows, if you use R from various different directories, each save
operation will overwrite the last.  That makes Linux more convenient,
but note that the file can be quite voluminous, so be sure to delete it
if you are no longer working on that particular project.

You can also save the image yourself, to whatever file you wish, by
calling {\bf save.image()}.  You can restore the workspace from that
file later on by calling {\bf load()}.

\section{Packages (Libraries)}
\label{packages}

\subsection{Basic Notions}

R uses packages to store groups of related pieces of
software.\footnote{This is one of the very few differences between R and
S.  In S, packages are called {\it libraries}, and many of the functions
which deal with them are different from those in R.}  The libraries are
visible as subdirectories of your library directory in your R
installation tree, e.g. {\bf /usr/lib/R/library}. The ones automatically
loaded when you start R include the {\bf base} subdirectory, but in
order  to  save memory and time, R does not automatically load all the
packages. You can check which packages are currently loaded by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> .path.package()
\end{Verbatim}

\subsection{Loading a Package from Your Hard Drive}

If you need a package which is in your R installation but not loaded
into memory yet, you must request it. For instance, suppose you wish to
generate multivariate normal random vectors. The function {\bf mvrnorm()} in
the package MASS does this.  So, load the library:

\begin{Verbatim}[fontsize=\relsize{-2}]
> library(MASS)
\end{Verbatim}

Then {\bf mvrnorm()} will now be ready to use. (As will be its documentation.
Before you loaded MASS, ``help(mvrnorm)" would have given an error message).

\subsection{Downloading a Package from the Web}

However, the package you want may not be in your R installation. One of the
big advantages of open-source software is that people love to share. Thus
people all over the world have written their own special-purpose R packages,
placing them in the CRAN repository and elsewhere. 

\subsubsection{Using install.package()}

One way to install a package is, not surprisingly, to use the {\bf
install.packages()} function.

As an example, suppose you wish to use the {\bf mvtnorm} package, which
computes multivariate normal cdf's and other quantities. Choose a
directory in which you  wish to install the package (and maybe others in
the future), say {\bf /a/b/c}. Then at the R prompt, type

\begin{Verbatim}[fontsize=\relsize{-2}]
> install.packages("mvtnorm","/a/b/c/")
\end{Verbatim}

This will cause R to automatically go to CRAN, download the package, compile
it, and load it into a new directory {\bf /a/b/c/mvtnorm}.

You do have to tell R where to find that package, though, which you can do
via the {\bf .libPaths()} function:

\begin{Verbatim}[fontsize=\relsize{-2}]
> .libPaths("/a/b/c/")
\end{Verbatim}

This will add that new directory to the ones R was already using.  If
you use that directory often enough, you may wish to add that call to
{\bf .libPaths()} in your {\bf .Rprofile} startup file in your home
directory.

A call to {\bf .libPaths()}, without an argument, will show you a list
of all the places R will currently look at for loading a package when
requested.

\subsubsection{Using ``R CMD INSTALL''}
\label{cmdinstall}

Sometimes one needs to install ``by hand,'' to do modifications needed
to make a particular R package work on your system.  The following
example will show how I did so in one particular instance, and will
serve as a case study on how to ``scramble'' if ordinary methods don't
work.

I wanted to install a package {\bf Rmpi} on our department's
instructional machines, in the directory {\bf /home/matloff/R}.  I tried
using {\bf install.packages()} first, but found that the automated
process could not find the MPI library on our machines.  The problem was
that R was looking for those files in {\bf /usr/local/lam}, whereas I
knew they were in {\bf /usr/local/LAM}.

So, I downloaded the {\bf Rmpi} files, in the packed form {\bf
Rmpi\_0.5-3.tar.gz}.  I unpacked that file in my directory {\bf ~/tmp},
producing a directory {\bf ~/tmp/Rmpi}.  

Now if there had been no problem, I could have just typed

\begin{Verbatim}[fontsize=\relsize{-2}]
% R CMD INSTALL -l /home/matloff/R  Rmpi
\end{Verbatim}

from within the {\bf ~/tmp} directory.  That command would install the
package contained in {\bf ~/tmp/Rmpi}, placing it in {\bf
/home/matloff/R}.  This would have been an alternative to calling {\bf
install.packages()}.

But as noted, I had to deal with a problem.  Within the {\bf ~/tmp/Rmpi}
directory there was a {\bf configure} file, so I ran

\begin{Verbatim}[fontsize=\relsize{-2}]
% configure --help 
\end{Verbatim}

on my Linux command line.  It told me that I could specify the location
of my MPI files to {\bf configure} as follows:

\begin{Verbatim}[fontsize=\relsize{-2}]
% configure --with-mpi=/usr/local/LAM
\end{Verbatim}

This is if one runs {\bf configure} directly, but I ran it via R:

\begin{Verbatim}[fontsize=\relsize{-2}]
% R CMD INSTALL -l /home/matloff/R  Rmpi --configure-args=--with-mpi=/usr/local/LAM
\end{Verbatim}

Well, that seemed to work, in the sense that R did install the package,
but it also noted that it had a problem with the threads library on our
machines.  Sure enough, when I tried to load {\bf Rmpi}, I got a runtime
error, saying that a certain threads function wasn't there.

I knew that our threads library was fine, so I went into {\bf configure}
file and commented-out two lines:

\begin{Verbatim}[fontsize=\relsize{-2}]
# if test $ac_cv_lib_pthread_main = yes; then
     MPI_LIBS="$MPI_LIBS -lpthread"
# fi
\end{Verbatim}

In other words, I forced it to use what I knew (or was fairly sure)
would work.  I then reran ``R CMD INSTALL,'' and the package then loaded
with no problem.

\subsection{Documentation}

You can get a list of functions in a package by calling {\bf library()} 
with the {\bf help} argument, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> library(help=mvrnorm)
\end{Verbatim}

for help on the {\bf mvrnorm} package.

\subsection{Built-in Data Sets}

R includes a few real data sets, for use in teaching, research or in
testing software.  Type the following:

\begin{Verbatim}[fontsize=\relsize{-2}]
> library(utils)
> help(data)
\end{Verbatim}

Here {\bf data} is contained within the {\bf utils} package. We load that
package, and use {\bf help()} to see what's in it, in this case various
data sets. We can load any of them but typing its name, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
> LakeHuron
\end{Verbatim}

\section{Handy Miscellaneous Features}

\subsection{Scrolling through the Command History}

During a session, you can scroll back and forth in your command history
by typing ctrl-p and ctrl-n. You can also use the {\bf history()}
command to list your more recent commands (which will be run through the
pager). Set the named argument {\bf max.show=Inf} if you want to see all
of them.

\subsection{The Pager}

This displays material one screen at a time. It is automatically invoked
by some R commends, such as {\bf help()}, and you can invoke it yourself
on lengthy output. For instance, if you have a long vector {\bf x} and
wish to display it one screen at a time, then instead of typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> x
\end{Verbatim}

type

\begin{Verbatim}[fontsize=\relsize{-2}]
> page(x)
\end{Verbatim}

Type ``/abc" to search for the string "abc" in the pager output, hit q
to quit the pager, etc.

\subsection{Calculating Run Time}

If you are not sure which of several approaches to use to get the fastest R
code, you can time them with the function {\bf system.time()}.  An
example is shown in Section \ref{simulation}.

\section{Writing C/C++ Functions to be Called from R}

You may wish to write your own C/C++ functions to be called from R, as
they may run much faster than if you wrote them in R.  (Recall, though,
that you may get a lot of speed out of R if you avoid using loops.  See
Section \ref{efficient}.)  The SWIG package can be used for this;
see \url{http://www.swig.org}.

\section{Parallel R}
\label{parr}

Since many R users have very large computational needs, various tools
for some kind of parallel operation of R have been devised.  These
involve parallel invocations of R, communicating through a network.

\subsection{Rmpi}

The Rmpi provides an interface for R to MPI, the popular message-passing
system.  

\subsubsection{Installation}

Say you want to install in the directory {\bf /a/b/c/}.  The easiest way
to do so is

\begin{Verbatim}[fontsize=\relsize{-2}]
> install.packages("Rmpi","/a/b/c/")
\end{Verbatim}

This will install Rmpi in the directory {\bf /a/b/c/Rmpi}.

I found that in order to get it to work, I needed to edit the shell
script {\bf Rslaves.sh}, replacing

\begin{Verbatim}[fontsize=\relsize{-2}]
#       $R_HOME/bin/R --no-init-file --slave --no-save < $1 > $hn.$2.$$.log 2>&1# else
#       $R_HOME/bin/R --no-init-file --slave --no-save < $1 > /dev/null 2>&1
\end{Verbatim}

by

\begin{Verbatim}[fontsize=\relsize{-2}]
        $R_HOME/bin/R --slave --no-save < $1 > $hn.$2.$$.log 2>&1
else
        $R_HOME/bin/R --slave --no-save < $1 > /dev/null 2>&1
\end{Verbatim}

You'll need to arrange for the directory {\bf /a/b/c} (not {\bf
/a/b/c/Rmpi}) to be added to your R library search path.  I recommend
placing a line

\begin{Verbatim}[fontsize=\relsize{-2}]
.libPaths("/a/b/c/")
\end{Verbatim}

in a file {\bf .Rprofile} in your home directory.

\subsubsection{Usage}

Fire up MPI, and then in R load in Rmpi, by typing

\begin{Verbatim}[fontsize=\relsize{-2}]
> library(Rmpi)
\end{Verbatim}

Then start Rmpi:

\begin{Verbatim}[fontsize=\relsize{-2}]
> mpi.spawn.Rslaves()
\end{Verbatim}

This will start R on all machines in the group you started MPI on.
Optionally, you can specify fewer machines via the named argument {\bf
nslaves}.

The first time you do this, try this test:

\begin{Verbatim}[fontsize=\relsize{-2}]
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
\end{Verbatim}

The available functions are similar to (and call) those of, such as

\begin{itemize}

\item {\bf mpi.comm.size():}  

Returns the number of MPI processes, including the master that spawned
the other processes.  The master will be rank 0.

\item {\bf mpi.comm.rank():}  

Returns the rank of the process that executes it.

\item {\bf mpi.send(), mpi.recv()}:  

The usual send/receive operations.

\item {\bf mpi.bcast(), mpi.scatter(), mpi.gather():}  

The usual broadcast, scatter and gather operations.

\item Etc.

\end{itemize}

Details are available at:

\begin{itemize}

\item \url{http://cran.r-project.org/web/packages/Rmpi/index.html}

Site for download of package and manual.

\item \url{http://ace.acadiau.ca/math/ACMMaC/Rmpi/sample.html}

Nice tutorial.

\end{itemize}

But we forego details here, as snow provides a nicer programmer
interface, to be described next.

\subsection{snow}
\label{snow}

snow runs on top of Rmpi or directly via sockets, allowing the
programmer to more conveniently express the parallel disposition of
work.

For instance, just as the ordinary R function {\bf apply()} applies the
same function to all rows of a matrix, the snow function {\bf
parApply()} does that in parallel, across multiple machines; different
machines will work on different rows.

\subsubsection{Installation}

Follow the same pattern as described above for Rmpi.  If you plan to
have snow run on top of Rmpi, you'll of course need the latter too.

Again, I needed to change a shell script, {\bf RSOCKnode.sh}, replacing

\begin{Verbatim}[fontsize=\relsize{-2}]
${RPROG:-R} --vanilla <<EOF > ${OUT:-/dev/null} 2>&1 &
\end{Verbatim}

by

\begin{Verbatim}[fontsize=\relsize{-2}]
${RPROG:-R} --no-save --no-restore <<EOF > ${OUT:-/dev/null} 2>&1 &
\end{Verbatim}

\subsubsection{Starting snow}

Make sure snow is in your library path (see material on Rmpi above).
Then load snow:

\begin{Verbatim}[fontsize=\relsize{-2}]
> library(snow)
\end{Verbatim}

One then sets up a cluster, by calling the snow function {\bf
makeCluster()}.  The named argument {\bf type} of that function
indicates the networking platform, e.g. ``MPI,'' ``PVM'' or ``SOCK." The
last indicates that you wish snow to run on TCP/IP sockets that it
creates itself, rather than going through MPI.  You may prefer to use
MPI for this, as it provides more flexibility, since one's code could
include calls to both snow functions and MPI (i.e. Rmpi) functions.
However, you may not want to bother with MPI if snow itself is enough.
In the examples here, I used ``SOCK,'' on machines named {\bf pc48} and
{\bf pc49}, setting up the cluster this way:

\begin{Verbatim}[fontsize=\relsize{-2}]
> cls <- makeCluster(type="SOCK",c("pc48","pc49"))
\end{Verbatim}

For MPI or PVM, one specifies the number of nodes to create, rather than
specifying the nodes themselves.

Note that the above R code sets up worker nodes at the machines named {\bf pc48}
and {\bf pc49}; these are in addition to the master node, which is the
machine on which that R code is executed

There are various other optional arguments.  One you may find useful is
{\bf outfile}, which records the result of the call in the file {\bf
outfile}.  This can be helpful if the call fails.

\subsubsection{Overview of Available Functions}

Let's look at a simple example of multiplication of a vector by a
matrix.  We set up a test matrix:

\begin{Verbatim}[fontsize=\relsize{-2}]
> a <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12),nrow=6)
> a
     [,1] [,2]
[1,]    1    7
[2,]    2    8
[3,]    3    9
[4,]    4   10
[5,]    5   11
[6,]    6   12
\end{Verbatim}

We will multiply the vector $(1,1)^{T}$ (T meaning transpose) by our
matrix {\bf a}, by defining a dot product function:

\begin{Verbatim}[fontsize=\relsize{-2}]
> dot <- function(x,y) {return(x%*%y)}
\end{Verbatim}

Let's test it using the ordinary {\bf apply()}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> apply(a,1,dot,c(1,1))
[1]  8 10 12 14 16 18
\end{Verbatim}

To review your R, note that this applies the function {\bf dot()} to
each row (indicated by the 1, with 2 meaning column) of {\bf a} playing
the role of the first argument to {\bf dot()}, and with c(1,1) playing
the role of the second argument.

Now let's do this in parallel, across our two machines in our cluster
{\bf cls}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> parApply(cls,a,1,dot,c(1,1))
[1]  8 10 12 14 16 18
\end{Verbatim}

The function {\bf clusterCall(cls,f,args)} applies the given function
{\bf f()} at each worker node in the cluster {\bf cls}, using the
arguments provided in {\bf args}.

The function {\bf clusterExport(cls,varlist)} copies the variables in
the list {\bf varlist} to each worker in the cluster {\bf cls}.  You can
use this to avoid constant shipping of large data sets from the master
to the workers; you just do so once, using {\bf clusterExport()} on the
corresponding variables, and then access those variables as global.
For instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
> z <- function() return(x)
> x <- 5
> y <- 12
> clusterExport(cls,list("x","y"))
> clusterCall(cls,z)
[[1]]
[1] 5

[[2]]
[1] 5
\end{Verbatim}

The function {\bf clusterEvalQ(cls,expression)} runs {\bf
expression} at each worker node in {\bf cls}.  Continuing the
above example, we have

\begin{Verbatim}[fontsize=\relsize{-2}]
> clusterEvalQ(cls,x <- x+1)
[[1]]
[1] 6

[[2]]
[1] 6

> clusterCall(cls,z)
[[1]]
[1] 6

[[2]]
[1] 6

> x
[1] 5
\end{Verbatim}

Note that {\bf x} still has its original version back at the master.

The function {\bf clusterApply(cls,individualargs,f,commonargsgohere)} runs {\bf
f()} at each worker node in {\bf cls}, with arguments as follows.  The
first argument to {\bf f()} for worker i is the i$^{th}$ element of the
list {\bf individualargs}, i.e. {\bf individualargs[[i]]}, and
optionally one can give additional arguments for {\bf f()} following
{\bf f()} in the argument list for {\bf clusterApply()}.

Here for instance is how we can assign an ID to each worker node, like
MPI {\bf rank}:\footnote{I don't see a provision in snow itself that
does this.}

\begin{Verbatim}[fontsize=\relsize{-2}]
> myid <- 0
> clusterExport(cls,"myid")
> setid <- function(i) {myid <<- i}  # note superassignment operator
> clusterApply(cls,1:2,setid)
[[1]]
[1] 1

[[2]]
[1] 2

> clusterCall(cls,function() {return(myid)})
[[1]]
[1] 1

[[2]]
[1] 2
\end{Verbatim}

Recall that the way snow works is to have a master node, the one from
which you invoke functions like {\bf parApply()}, and then a cluster of
worker nodes.  Suppose the function you specify as an argument to {\bf
parApply()} is {\bf f()}, and that {\bf f()} calls {\bf g()}.  Then {\bf
f()} itself (its code) is passed to the cluster nodes, but {\bf g()} is
not.  Therefore you must first pass {\bf g()} to the cluster nodes via a
call to {\bf clusterExport()}.

Don't forget to stop your clusters before exiting R, by calling
{\bf stopCluster(clustername)}.

There are various other useful  snow functions.  See the user's manual for
details.

\subsubsection{More Snow Examples}

In the first example, we do a kind of one-level Quicksort, breaking the
array into piles, Quicksort-style, then having each cluster node work on
its pile, then consolidate.  We assume a two-node cluster.

\begin{Verbatim}[fontsize=\relsize{-2}]
# sorts the array x on the cluster cls
qs <- function(cls,x) {
   pvt <- x[1]  # pivot
   chunks <- list()
   chunks[[1]] <- x[x <= pvt]  # low pile
   chunks[[2]] <- x[x > pvt]  # high pile
   # now parcel out the piles to the clusters, which sort the piles
   rcvd <- clusterApply(cls,chunks,sort)
   lx <- length(x)
   lc1 <- length(rcvd[[1]])
   lc2 <- length(rcvd[[2]])
   y <- vector(length=lx)
   if (lc1 > 0) y[1:lc1] <- rcvd[[1]]
   if (lc2 > 0) y[(lc1+1):lx] <- rcvd[[2]]
   return(y)
}
\end{Verbatim}

The second example implements the Shearsort sorting algorithm.  Here one
imagines the nodes laid out as an n x n matrix (they may really have
such a configuration).  Here is the pseudocode:

\begin{Verbatim}[fontsize=\relsize{-2}]
for i = 1 to log2(n^2) + 1
    if i is odd
       sort each even row in descending order
       sort each odd row in ascending order
    else
       sort each column is ascending order
\end{Verbatim}

And here is the Snow code:

\begin{Verbatim}[fontsize=\relsize{-2}]
is <- function(cls,dm) {
   n <- nrow(dm)
   numsteps <- ceiling(log2(n*n)) + 1
   for (step in 1:numsteps) {
      if (step %% 2 == 1) {
         # attach a row ID to each row, so will know odd/even
         augdm <- cbind(1:n,dm)
         # parcel out to the cluster members for sorting
         dm <- parApply(cls,augdm,1,augsort)
         dm <- t(dm)  # recall need to transpose after apply()
      } else dm <- parApply(cls,dm,2,sort)
   }
   return(dm)
}

augsort <- function(augdmrow) {
   nelt <- length(augdmrow)
   if (augdmrow[1] %% 2 == 0) {
      return(sort(augdmrow[2:nelt],decreasing=T))
   } else return(sort(augdmrow[2:nelt]))
}
\end{Verbatim}

\subsubsection{Parallel Simulation, Including the Bootstrap}

If you wish to use snow on simulation code, including bootstrapping, you
need to make sure that the random number streams at each cluster node
are independent.  Indeed, just think of what would happen if you just
take the default random number seed---you'll get identical results at
all the nodes, which certainly would defeat the purpose of parallel
operation!

The careful way to do this is to install the R package {\bf rlecuyer}.
Then, before running a simulation, call {\bf clusterSetupRNG()}, which
in its simplest form consists simply of

\begin{Verbatim}[fontsize=\relsize{-2}]
clusterSetupRNG(cl)
\end{Verbatim}

for the cluster {\bf cl}.

\subsubsection{Example}

Here we estimate the prediction error rate using ``leaving one out''
cross-validation in logistic regression.

\begin{Verbatim}[fontsize=\relsize{-2}]
# working on the cluster cls, find prediction error rate for the data
# matrix dm, with the response variable having index resp and the
# predictors having indices prdids
prerr <- function(cls,dm,resp,prdids) {
   # set up an artificial matrix for parApply()
   nr <- nrow(dm)
   loopmat <- matrix(1:nr,nrow=nr)
   # parcel out the rows of loopmat to the various members of the
   # cluster cls; for each row, they will apply delone() with arguments
   # being that row, dm, resp and prdids; the return vector consists
   # of 1s and 0s, 1 meaning that our prediction was wrong
   errs <- parApply(cls,loopmat,1,delone,dm,resp,prdids)  
   return(sum(errs)/nr)
}

# temporarily delete row delrow from the data matrix dm, with the
# response variable having index resp and the predictors having indices
# prdids
delone <- function(delrow,dm,resp,prdids) {
   # get all indices except delrow
   therest <- ab1(1,delrow,nrow(dm))
   # fit the model
   lmout <- glm(dm[therest,resp] ~ dm[therest,prdids], family=binomial)
   # predict the deleted row
   cf <- as.numeric(lmout$coef)
   predvalue <- logit(dm[delrow,prdids],cf)
   if (predvalue > 0.5) {predvalue <- 1}
   else {predvalue <- 0)
   return(abs(dm[delrow,resp]-predvalue)) 
}

# "allbutone": returns c(a,a+1,...,b-1,b+1,...c)
ab1 <- function(a,b,c) {
   if (a == b) return((b+1):c)
   if (b == c) return(a:(b-1))
   return(c(a:(b-1),(b+1):c))
}

# finds the value of the logistic function at a given x for a given set
# of coefficients b
logit <- function(x,b) {
   lin <- c(1,x) %*% b
   return(1/(1+exp(-lin)))
}
\end{Verbatim}

\subsubsection{To Learn More about snow}

I recommend the following Web pages:

\begin{itemize}

\item \url{http://cran.cnr.berkeley.edu/web/packages/snow/index.html}

CRAN page for snow; the package and the manual are here.

\item
\url{http://www.bepress.com/cgi/viewcontent.cgi?article=1016&context=uwbiostat}

A research paper.

\item \url{http://www.cs.uiowa.edu/~luke/R/cluster/cluster.html}

Brief intro by the author.

\item \url{http://www.sfu.ca/~sblay/R/snow.html#clusterCall}

Examples, short but useful.

\end{itemize}

% \subsection{The papply() Function}
% \label{papply}
% 
% The {\bf papply()} function actually runs on top of Rmpi, but provides a
% higher-level interface than that library does.  The idea is to build on
% R's familiar {\bf apply()} function (or more precisely, {\bf lapply()}.
% 
% Here's an overview of how it works:
% 
% \begin{itemize}
% 
% \item One constructs a list of tasks to be performed.
% 
% \item One calls {\bf papply()} on that list, specifying a function {\bf
% f()} to be applied to each element of the list.
% 
% \item The action of {\bf papply()} is to distribute the elements of the
% list to the various MPI nodes.  Each node applies {\bf f()} to its share
% of elements, returning a list of the results to node 0.
% 
% \end{itemize}
% 
% Here's an example.  We wish to find the minimum value in an array.  We
% will use {\bf papply()} to distribute pieces of the array to the nodes.
% Each node will find the minimum in each of its pieces.  Those minima
% will be returned to node 0, which then finds the minimum of them, thus
% finding the minimum value in the original array.
% 
% \begin{Verbatim}[fontsize=\relsize{-2}]
% # use of papply to compute min(x) in parallel; the vector x is broken
% # into equal-sized chunks (for convenience here, it is assumed that
% # length(x) is a multiple of the number of MPI nodes), which are farmed
% # out to the nodes
% 
% # assumes mpi.spawn.Rslaves() has already been called
% 
% library(papply)
% 
% papplymin <- function(x) {
%    nnodes <- mpi.comm.size()  # all the Rmpi functions are available
%    nx <- length(x)
%    # get starting indices of the chunks
%    k <- 0:(nnodes-1)
%    startplaces <- nx * (k/nnodes) + 1
%    # now create a list, the basic structure on which papply() works
%    xlist <- list()
%    chunksize <- nx / nnodes
%    # each chunk will comprise one element of the list
%    for (i in 1:nnodes)
%       xlist[[i]] <- x[startplaces[i]:(startplaces[i]+ chunksize - 1)]
%    # each node will apply the min() function to its chunk, with the
%    # results then placed in the list mins at node 0
%    mins <- papply(xlist,min)
%    return(min(as.numeric(mins)))
% }
% \end{Verbatim}

\section{Using R from Python}

Python is an elegant and powerful language, but lacks built-in
facilities for statistical and data manipulation, two areas in which R
excels.  Thus an interface between the two languages would be highly
useful; RPy is probably the most popular of these.  RPy is a Python
module that allows access to R from Python.  For extra efficiency, it
can be used in conjunction with NumPy.

You can build the module from the source, available from
\url{http://rpy.sourceforge.net}, or download a prebuilt version.  If
you are running Ubuntu, simply type

\begin{Verbatim}[fontsize=\relsize{-2}]
sudo apt-get install python-rpy
\end{Verbatim}

To load RPy from Python (whether in Python interactive mode or from
code), execute

\begin{Verbatim}[fontsize=\relsize{-2}]
from rpy import *
\end{Verbatim}

This will load a variable {\bf r}, which is a Python class instance.  

Running R from Python is in principle quite simple.  For instance,
running 

\begin{Verbatim}[fontsize=\relsize{-2}]
>>> r.hist(r.rnorm(100))
\end{Verbatim}

from the Python prompt will call the R function {\bf rnorm()} to produce
100 standard normal variates, and then input those values into R's
histogram function, {\bf hist()}.  As you can see, R names are prefixed
by {\bf r.}, reflecting the fact that Python wrappers for R functions 
are members of the class instance {\bf r}.\footnote{They are loaded
dynamically, as you use them.}

By the way, note that the above code will, if not refined, produce ugly
output, with your (possibly voluminous!) data appearing as the graph
title and the X-axis label.  You can avoid this by writing, for example, 

\begin{Verbatim}[fontsize=\relsize{-2}]
>>> r.hist(r.rnorm(100),main='',xlab='')
\end{Verbatim}

RPy syntax is sometimes less simple than the above examples would lead
us to believe.  The problem is that there may be a clash of R and Python
syntax.  Consider for instance a call to the R linear model function
{\bf lm()}.  In our example, we will predict {\bf b} from {\bf a}:

\begin{Verbatim}[fontsize=\relsize{-2}]
>>> a = [5,12,13]
>>> b = [10,28,30]
>>> lmout = r.lm('v2 ~ v1',data=r.data_frame(v1=a,v2=b))
\end{Verbatim}

This is somewhat more complex than it would have been if done directly
in R.  What are the issues here?

First, since Python syntax does not include the tilde character, we
needed to specify the model formula via a string.  Since this is done in
R anyway, this is not a major departure.

Second, we needed a data frame to contain our data.  We created one
using R's {\bf data.frame()} function, but note that again due to syntax
clash issues, RPy converts periods in function names to underscores, so
we need to call {\bf r.data\_frame()}.  Note that in this call, we named
the columns of our data frame {\bf v1} and {\bf v2}, and then used these
in our model formula.

The output object is a Python dictionary, as can be seen:

\begin{Verbatim}[fontsize=\relsize{-2}]
>>> lmout
{'qr': {'pivot': [1, 2], 'qr': array([[ -1.73205081, -17.32050808],
       [  0.57735027,  -6.164414  ],
       [  0.57735027,   0.78355007]]), 'qraux': [1.5773502691896257, 1.6213286481208891], 'rank': 2, 'tol': 9.9999999999999995e-08}, 'terms': <Robj object at 0xb7db71d0>, 'effects': {'': -0.37463432463267887, 'v1': -15.573256428553197, '(Intercept)': -39.259818304894559}, 'xlevels': {}, 'rank': 2, 'df.residual': 1, 'fitted.values': {'1': 10.035087719298247, '3': 30.245614035087719, '2': 27.719298245614034}, 'call': <Robj object at 0xb7db71c0>, 'model': {'v1': [5, 12, 13], 'v2': [10, 28, 30]}, 'assign': [0, 1], 'coefficients': {'v1': 2.5263157894736841, '(Intercept)': -2.5964912280701729}, 'residuals': {'1': -0.035087719298245779, '3': -0.24561403508772012, '2': 0.2807017543859659}}
\end{Verbatim}

You should recognize the various attributes of {\bf lm()} objects there.
For example, the coefficients of the fitted regression line, which would
be contained in {\bf lmout\$coefficients} if this were done in R, are
here in Python as {\bf lmout['coefficients'}.  So, we can access those
coefficients accordingly, e.g. 

\begin{Verbatim}[fontsize=\relsize{-2}]
>>> lmout['coefficients']
{'v1': 2.5263157894736841, '(Intercept)': -2.5964912280701729}
>>> lmout['coefficients']['v1']
2.5263157894736841
\end{Verbatim} 

One can also submit R commands to work on variables in R's namespace,
using the function {\bf r()}.  This is convenient if there are many
syntax clashes.  Here is how we could run the {\bf wireframe()} example
in Section \ref{3d} in RPy:

\begin{Verbatim}[fontsize=\relsize{-2}]
>>> r.library('lattice')
>>> r.assign('a',a)
>>> r.assign('b',b)
>>> r('g <- expand.grid(a,b)')
>>> r('g$Var3 <- g$Var1^2 + g$Var1 * g$Var2')
>>> r('wireframe(Var3 ~ Var1+Var2,g)')
>>> r('plot(wireframe(Var3 ~ Var1+Var2,g))')
\end{Verbatim}

We first used {\bf r.assign()} to copy a variable from Python's
namespace to R's.  We then ran {\bf expand.grid()} (with a period in the
name instead of an underscore, since we are running in R's namespace),
assigning the result to {\bf g}.  Again, the latter is in R's namespace.

Note that the call to {\bf wireframe()} did not automatically display
the plot, so we needed to call {\bf plot()}.

The official documentation for RPY is at
\url{http://rpy.sourceforge.net/rpy/doc/rpy.pdf}, with a useful
presentation available at
\url{http://www.daimi.au.dk/~besen/TBiB2007/lecture-notes/rpy.html}.

\section{Tools}

\subsection{Using R from emacs}

There is a very popular package which allows one to run R (and some other
statistical packages) from within emacs, ESS. I personally do not use it,
but it clearly has some powerful features for those who wish to put in a bit
of time to learn the package. As described in the R FAQ, ESS offers R users:

\begin{quote}

R support contains code for editing R source code (syntactic indentation
and highlighting of source code, partial evaluations of code, loading
and error-checking  of  code, and source code revision maintenance) and
documentation (syntactic indentation and highlighting of source code,
sending examples to running ESS process, and previewing), interacting
with an inferior R process from within Emacs (command-line editing,
searchable command history, command-line completion of R object and file
names, quick access to object and search lists, transcript recording,
and an interface to the help system), and transcript manipulation
(recording and saving transcript  files,  manipulating and editing saved
transcripts, and re-evaluating commands from transcript files).

\end{quote}

\subsection{GUIs for R}
\label{gui}

As seen above, one submits commands to R via text, rather than mouse
clicks in a Graphical User Interface (GUI).  If you can't live without
GUIs, you should consider using one of the free GUIs that have been
developed for R, e.g. R Commander
(\url{http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/}), JGR
(\url{http://stats.math.uni-augsburg.de/JGR/}), or the Eclipse plug-in
StatEt (\url{http://jgr.markushelbig.org/JGR.html}).

\section{Inference on Simple Means and Proportions}

The function {\bf t.test} provides Student-t confidence intervals and
hypothesis tests in one- and two-mean situations.  It returns an object
of class {\bf htest}, which includes various components; type

\begin{Verbatim}[fontsize=\relsize{-2}]
> ?t.test
\end{Verbatim}

for details.

There is a similar function {\bf prop.test} for proportions.

\section{Linear and Generalized Linear Models}

R has a very rich set of facilities for linear models.  At least two
entire books have been written on this one aspect of R!  But we'll give
some of the basics here.

\subsection{Linear Regression Analysis}
\label{regress}

In statistical {\it regression analysis}, we try to predict a {\it
response variable} Y by one or more {\it predictor variables}, $X^{(1)},
X^{(2)},...,X^{(k)}$.  The regression function is defined to be

\begin{equation}
m(t) = E(Y | X^{(1)} = t_1,...,X^{(k)} = t_k)
\end{equation}

where t is the vector $(t_1,...,t_k)$.

Typically we model m(t) as linear in the $t_i$, i.e. we assume that

\begin{equation}
m(t_1,...,t_k) = \beta_0 + \beta_1 t_1 +...+ \beta_k t_k
\end{equation}

Here the $\beta_i$ are population values, to be estimated via our sample
data.

In our example in Section \ref{second}, let's try to predict the third
exam score from the first.  (Our goal here may not be prediction {\it
per se}, but rather to see whether the two exams are highly correlated.)

In R, we can use {\bf lm()} (``linear model'') function to fit a linear
model here, meaning that we will find the values of c and d for which c
+ d Exam1 best predicts Exam3: 

\begin{Verbatim}[fontsize=\relsize{-2}]
> fit1 <- lm(Exam3 ~ Exam1,data=testscores)
> fit1$coefficients
(Intercept)        Exam1
  3.7008841    0.7458898
\end{Verbatim}

We find that c = 3.7 and d = 0.75.

The notation 

\begin{Verbatim}[fontsize=\relsize{-2}]
Exam3 ~ Exam1 
\end{Verbatim}

means that we wish to predict {\bf Exam3} from {\bf Exam1}.  

\begin{Verbatim}[fontsize=\relsize{-2}]
Exam3 ~ Exam1 + Exam2
\end{Verbatim}

would mean to predict {\bf Exam3} from {\bf Exam2} and {\bf Exam2}.

\begin{Verbatim}[fontsize=\relsize{-2}]
Exam3 ~ Exam1 * Exam2
\end{Verbatim}

means to predict {\bf Exam3} from {\bf Exam2} and {\bf Exam2}, with an
interaction term, i.e.

\begin{Verbatim}[fontsize=\relsize{-2}]
E(Exam3 | Exam1, Exam2) = a + b Exam1 + c Exam2 + d Exam1*Exam2
\end{Verbatim}

The interaction term would be referred to as {\tt Exam1:Exam2} by R.

Note that {\bf coefficients} is a member variable within the {\bf lm} 
class, hence the \$ sign as we saw in Section \ref{list}.  
% Yes, lists are used to store class objects.

Another member variable of that class is {\bf formula}, a string in
which we specify the regression model, in this case predicting Exam3
from Exam1.  Note by the way that since the model is a string, we can
program many models in a loop, using R's string manipulation facilities
(Section \ref{string}).

Actually, the {\bf fit1} object contains a lot more information, such as
{\bf fit1\$residuals}, a vector consisting of the prediction errors,
i.e. how far off our predicted values are from the real ones.

If we simply call {\bf summary()}, as we did above, it prints out, but
the return value of that function is also an object, of class {\bf
summary.lm}.

% Though the exam data is concrete and meaningful, let's make this example
% even simpler, with some artificial data, consisting of the points (1,1),
% (2,3) and (3,8):
% 
% \begin{Verbatim}[fontsize=\relsize{-2}]
% > x <- c(1,2,3)
% > y <- c(1,3,8)
% > lmout <- lm(y ~ x)
% > lmout
% 
% Call:
% lm(formula = y ~ x)
% 
% Coefficients:
% (Intercept)            x
%        -3.0          3.5
% \end{Verbatim}

If we make sure to have our data in matrix form rather than as a data
frame (applying {\bf as.matrix()} if necessary), then we 
can conveniently specify multiple predictors by making use of R's 
indexing operations. For instance, if we have a data matrix {\bf z},

\begin{Verbatim}[fontsize=\relsize{-2}]
> lm(z[,1] ~ z[,2:3])
\end{Verbatim}

would be easier to type than

\begin{Verbatim}[fontsize=\relsize{-2}]
> lm(z[,1] ~ z[,2]+z[,3])
\end{Verbatim}

Computation of the estimated coefficients requires a matrix
inversion (or equivalent operation), and this will fail if one predictor
is an exact linear combination of the others (or is so within roundoff
error).  If so, R will remove one of the predictors, and you will see NA
listed as the corresponding estimated coefficient.


\subsection{Generalized Linear Models}

Here we assume that some function of the regression function m(t),
rather than m(t) itself, has a linear form.  

In R, the function that handles such models is {\bf glm()}.  The named
argument {\bf family} specifies which function of m(t) has a linear
form.

\subsubsection{Logistic Regression}

Probably the most famous generalized linear model is the {\it logistic
model}, which is applied when Y is a boolean quantity, coded by the
values 1 and 0.  Here the quantity $\ln[m(t)/(1-m(t))$ is assumed to be
equal to $\beta_0 + \beta_1 t_1 +...+ \beta_k t_k$ for some population
values $\beta_i$.  This is equivalent to saying

\begin{equation}
m(t) = P(Y = 1 | X = t) = \frac{1}{1+e^{-(\beta_0+\beta_1
t_1+...+\beta_r t_r)}}
\end{equation}

This is an intuitively appealing model, as it produces values in (0,1),
appropriate for modeling a probability function, and is a monotonic
function in the linear form.

One calls {\bf glm()} to fit a generalized linear model.  In the
logistic case, we specify the argument {\bf family=binomial}, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
glm(x[,1] ~ x[,2], family=binomial)
\end{Verbatim}

\subsubsection{The Log-Linear Model}
\label{loglin}

Consider the example {\bf ct} in Section \ref{factors}, and in
particular the data frame of counts {\bf ctdf}.  We can perform a
log-linear model analysis then we can use {\bf glm()} to regress the
counts ({\bf ctdf\$Freq} against the factors {\bf ctdf\$VoteX}
and {\bf ctdf\$VoteLast}, setting {\bf family=poisson}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> g <- glm(Freq ~ Vote.for.X * Vote.Last.Time, family=poisson, data=ctdf)
\end{Verbatim}

Here we have chosen to fit the saturated model.

\subsubsection{The Return Value of glm()}

The function {\bf glm()} returns an object of type {\bf glm}, a subclass
of {\bf lm}.  The latter provides us with a number of member variables
and functions, such as discussed in Sec. \ref{generic}.

In addition, there are various {\bf glm()}-specific variables and
functions, such as:

\begin{itemize}

\item {\bf iter} is the number of iterations used

\item {\bf converged} is a logical variable indicating whether
convergence occurred

\end{itemize}

\subsection{Some Generic Functions for All Linear Models}
\label{generic} 

These will work on all the linear models: 

\begin{itemize}

\item {\bf vcov()} will return the estimated covariance matrix for our
estimated coefficients

\item {\bf fitted()} will return the fitted values, i.e. the predicted Y
values for the X's in our sample 

\item {\bf coef()} returns the estimated coefficients

\end{itemize}

\section{Principal Components Analysis}

R includes two functions for principal components analysis, {\bf
princomp()} and {\bf prcomp()}.  The latter is preferred, as it has
better numerical accuracy, though the former is included for
compatibility with S.

In its simplest form, the call is

\begin{Verbatim}[fontsize=\relsize{-2}]
> prc <- prcomp(m)
\end{Verbatim}

where {\bf m} is a matrix or data frame.

\section{Sampling Subsets}

\subsection{The sample() Function}

We can use the {\bf sample()} function to do sampling with or without
replacement from a finite set.  This is handy in simulations, for
example.

\subsection{The boot() Function}

The {\it bootstrap} is a resampling method for performing statistical
inference in analytically intractable situations. If we have an
estimator but no standard error, we can get one by resampling from our
sample data many times, calculating the estimator each time, and then
taking the standard deviation of all those generated values. You could
just use {\bf sample()} for this, but R has a package, {\bf boot}, that
automates the procedure for you.

To use the package, you must first load it:

\begin{Verbatim}[fontsize=\relsize{-2}]
> library(boot)
\end{Verbatim}

Inside that package is a function, {\bf boot()}, that will do the work
of bootstrapping.

Suppose for example we have a data array {\bf y} of length 100, from
which we wish to estimate a population median, using {\bf y}, and have a
standard error as well. We could do the following.

First we must define a function which defines the statistic we wish to
compute, which in this case is the median. This function will be called
by {\bf boot()} (it is named {\bf statistic()} in the list of {\bf
boot()}'s formal parameters). We could define it as follows:

\begin{Verbatim}[fontsize=\relsize{-2}]
> mdn <- function(x,i) {
+    return(median(x[i]))
+ }
\end{Verbatim}

It may seem to you that all this is doing is to call R's own {\bf
median()} function, and thus may wonder why we need to define our own
new function.  But it is definitely needed, with the key being our
second parameter, {\bf i}.  When we call {\bf boot()}, the latter will
generate a specified number of indices (see below), sampled randomly
with replacement from {1,...,100} (recall 100 is our sample size here).
R will then {\bf i} to this set of random indices when it calls {\bf
mdn()}. 

For example R might generate {\bf i} to be the vector
[3,22,59,3,14,6,...] Of course, in {\bf boot}()'s call to {\bf mdn()},
the formal parameter {\bf x} is our vector {\bf y} here.  So, the
expression {\bf x[i]} means {\bf y[c(3,22,59,3,14,6,...)]}, in other
words the vector {\bf [y[3],y[22],y[59],y[3],y[14],y[6],...]}---exactly
the kind of thing the bootstrap is supposed to do.

To then call {\bf boot()}, we do something like

\begin{Verbatim}[fontsize=\relsize{-2}]
> b <- boot(y,mdn,R=200)
\end{Verbatim}

This tells R to apply the bootstrap to the data {\bf y}, calculating the
statistic {\bf mdn()} on each of 200 resamplings of {\bf y}. 

Normally, we would assign the result of {\bf boot()} to an object, as 
we did with {\bf b} above. Among the components of that object are 
{\bf b\$t}, which is a matrix whose $i^{th}$ row gives the value of 
the statistic as found on the $i^{th}$ bootstrap resampling, and {\bf b\$t0}, 
which is the value of our statistic on the original data set.  

A somewhat more sophisticated example (they can become quite complex)
would be that in which our data is stored in a data frame, say a frame
{\bf d} consisting of 100 rows of two columns. We might be doing
regression of the first column against the second.  (Let's assume that
both the predictor and response data is random, i.e. this is not a
``fixed-X regression'' situation.) An index {\bf i} here of
[3,22,59,3,14,6,...] could mean that our resampling would give us rows
3, 22, 59 and so on of {\bf d}.  We could set our {\bf statistic()}
function to 

\begin{Verbatim}[fontsize=\relsize{-2}]
dolm <- function(fulldata,i)  { 
   bootdata <- fulldata[i,] 
   lmout <- lm(bootdata[,1]~bootdata[,2]) 
   return(lmout$coef) 
\end{Verbatim}

(I've put in some extra steps for clarity.) 

Our call to {\bf boot()} could then be, for instance, 

\begin{Verbatim}[fontsize=\relsize{-2}]
> boot(d,dolm,R=500)
\end{Verbatim}

\section{To Learn More}

There is a plethora of resources one can drawn upon to learn more about
R.

\subsection{R's Internal Help Facilities}

\subsubsection{The help() and example() Functions}

For online help, invoke {\bf help()}. For example, to get
information on the {\bf seq()} function, type

\begin{Verbatim}[fontsize=\relsize{-2}]
> help(seq)
\end{Verbatim}

or better,

\begin{Verbatim}[fontsize=\relsize{-2}]
> ?seq
\end{Verbatim}

Each of the help entries comes with examples.  One really nice feature
is that the {\bf example()} function will actually run thus examples for
you.  For instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
?seq> example(seq)

seq> seq(0, 1, length=11)
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

seq> seq(rnorm(20))
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

seq> seq(1, 9, by = 2) # match
[1] 1 3 5 7 9

seq> seq(1, 9, by = pi)# stay below
[1] 1.000000 4.141593 7.283185

seq> seq(1, 6, by = 3)
[1] 1 4
...
\end{Verbatim}

Imagine how useful this is for graphics!  To get a quick and very nice
example, the reader is urged to run the following {\bf RIGHT NOW}:

\begin{Verbatim}[fontsize=\relsize{-2}]
> example(persp)
\end{Verbatim}

\subsubsection{If You Don't Know Quite What You're Looking for}

You can use the function {\bf help.search()} to do a ``Google''-style
search through R's documentation in order to determine which function
will play a desired role. For instance, in Section \ref{packages} above,
we needed a function to generate random variates from multivariate
normal distributions. To determine what function, if any, does this, we
could type

\begin{Verbatim}[fontsize=\relsize{-2}]
> help.search("multivariate normal")
\end{Verbatim}

getting a response which contains this excerpt:

\begin{Verbatim}[fontsize=\relsize{-2}]
mvrnorm(MASS)           Simulate from a Multivariate Normal
                        Distribution
\end{Verbatim}

This tells us that the function {\bf mvrnorm()} will do the job, and it
is in the package {\bf MASS}.

You can also go to the place in your R directory tree where the base or
other package is stored. For Linux, for instance, that is likely {\bf
/usr/lib/R/library/base} or some similar location. The file {\bf CONTENTS} in
that directory gives brief descriptions of each entity.

\subsection{Help on the Web}

\subsubsection{General Introductions}

\begin{itemize}

\item \url{http://cran.r-project.org/doc/manuals/R-intro.html}, is the R
Project's own introduction.

\item \url{http://personality-project.org/r/r.guide.html}, 
by Prof. Wm. Revelle of the Dept. of Psychology of
Northwestern University; especially good for material on
multivariate statistics and structural equation modeling.

\item \url{http://www.math.csi.cuny.edu/Statistics/R/simpleR/index.html}: 
a rough form of John Verzani's book, {\it simpleR}; nice coverage of
various statistical procedures.

\item \url{http://zoonek2.free.fr/UNIX/48_R/all.html}: A large general
reference by Vincent Zoonekynd; really excellent with as wide a
statistics coverage as I've seen anywhere. 

\item \url{http://wwwmaths.anu.edu.au/~johnm/r/usingR.pdf}: A draft of
John Maindonald's book; he also has scripts, data etc. on his full site
\url{http://wwwmaths.anu.edu.au/~johnm/r/}. 

\item \url{http://www2.uwindsor.ca/~hlynka/HlynkaIntroToR.pdf}:
A nice short first introduction by M. Hlynka of the University of
Windsor.

\item \url{http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html}:  A
general tutorial but with lots of graphics and good examples from real
data sets, very nice job, by Prof. Dong-Yun Kim of the Dept. of
Math. at Illinois State University. 

\item
\url{http://www.ling.uni-potsdam.de/~vasishth/VasishthFS/vasishthFS.pdf}:
A draft of an R-simulation based textbook on statistics by Shravan
Vasishth. 

\item
\url{http://www.medepi.net/epir/index.html}:  A set of tutorials by Dr.
Tomas Aragon.  Though they are aimed at a epidemiologist readership,
there is much material here.  Chapter 2, *Working with R Data Objects,"
is definitely readable by general audiences.

\item
\url{http://cran.stat.ucla.edu/doc/contrib/Robinson-icebreaker.pdf}:
{\it icebreakeR}, a general tutorial by Prof. Andrew Robinson,
excellent.

\end{itemize}

\subsubsection{Especially for Reference}

\begin{itemize}

% Where did this go?
% \item \url{http://www.ugcs.caltech.edu/manuals/math/R-2.2.1-intro}:
% A reference at Cal Tech, complete with an index! 

\item \url{http://www.mayin.org/ajayshah/KB/R/index.html}: {\it R by
Example}, a quick handy chart on how to do various tasks in R, nicely
categorized.

\item \url{http://cran.r-project.org/doc/contrib/Short-refcard.pdf}:
R reference card, 4 pages, very handy.

\item \url{http://www.stats.uwo.ca/computing/R/mcleod/default.htm}:
A.I. McLeod's {\it R Lexicon}.

\end{itemize}

\subsubsection{Especially for Programmers}

\begin{itemize}

\item \url{http://zoonek2.free.fr/UNIX/48_R/02.html}: The programming
section of Zoonekynd's tutorial; includes some material on OOP.

% \item \url{http://www.public.iastate.edu/~mervyn/stat579}:  Some nice
% tutorial material by Prof. Marasinghe at Iowa State; see the sections
% titled ``Programming in R," ``More Programming in R," ``Writing R
% Functions," etc. 

\item \url{http://cran.r-project.org/doc/FAQ/R-FAQ.html}: R FAQ, mainly
aimed at programmers.

\item \url{http://bayes.math.montana.edu/Rweb/Rnotes/R.html}:  Reference
manual, by several prominent people in the R/S community.

\item \url{http://wiki.r-project.org/rwiki/doku.php?id=tips:tips}:  Tips
on miscellaneous R tasks that may not have immediately obvious solutions.

\end{itemize}

\subsubsection{Especially for Graphics}
\label{graphicshelp}

There are many extensive Web tutorials on this, including:

\begin{itemize}

\item
\url{http://www.sph.umich.edu/~nichols/biostat_bbag-march2001.pdf}: A
slide-show tutorial on R graphics, very nice, easy to follow, 
by M. Nelson of Esperion Therapeutics.

\item \url{http://zoonek2.free.fr/UNIX/48_R/02.html}: The graphics
section of Zoonekynd's tutorial.  Lots of stuff here.

\item 
\url{http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html}:  
Again by Prof. Dong-Yun Kim of the Dept. of Math. at Illinois State 
University.  Extensive material on use of color.

\item \url{http://wwwmaths.anu.edu.au/~johnm/r/usingR.pdf}: A draft of
John Maindonald's book; he also has scripts, data etc. on his full site
\url{http://wwwmaths.anu.edu.au/~johnm/r/}.  Graphics used heavily
throughout, but see especially Chapters 3 and 4, which are devoted to
this topic.

\item
\url{http://www.public.iastate.edu/\%7emervyn/stat579/r_class6_f05.pdf}.
Prof. Marasinghe's section on graphics.

\item \url{http://addictedtor.free.fr/graphiques/}:  The R Graphics
Gallery, a collection of graphics examples with the source code.

\item
\url{http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html}: Web
page for the book, {\it R Graphics}, by Paul Murrell, Chapman and Hall,
2005; Chapters 1, 4 and 5 are available there, plus all the figures from
the book and the R code which generated them.

\item
\url{http://www.biostat.harvard.edu/~carey/CompMeth/StatVis/dem.pdf}:
By Vince Carey of the Harvard Biostatistics Dept.  Lots of pictures, but
not much explanation.

\end{itemize}

\subsubsection{For Specific Statistical Topics}

\begin{itemize}

\item {\it Practical Regression and Anova using R}, by Julian Faraway
(online book!),
\url{http://www.cran.r-project.org/doc/contrib/Faraway-PRA.pdf}.

\end{itemize}

\subsubsection{Web Search for R Topics}

Lots of resources here.

\begin{itemize}

\item Various R search engines are listed on the R home page;
\url{http://www.r-project.org}.  Click on Search.  

\item You can search the R site itself by invoking the function {\bf
RSiteSearch()} from within R.  It will interact with you via your Web
browser.  

\item I use RSeek, \url{http://www.rseek.org} a lot. 

\item I also use \url{finzi.psych.upenn.edu}.

\end{itemize}

\appendix

\section{Installing/Updating R}

\subsection{Installation}

There are precompiled binaries for Windows, Linux and MacOS X at the R
home page, \url{www.r-project.org}. Just click on Download (CRAN).

Or if you have Fedora Linux, just type

\begin{Verbatim}[fontsize=\relsize{-2}]
$ yum install R
\end{Verbatim}

In Ubuntu Linux, do:

\begin{Verbatim}[fontsize=\relsize{-2}]
$ sudo apt-get install r-base
\end{Verbatim}

On Linux machines, you can compile the source yourself by using the usual

\begin{Verbatim}[fontsize=\relsize{-2}]
configure
make
make install
\end{Verbatim}

sequence. If you want to install to a nonstandard directory, say {\bf
/a/b/c}, run {\bf configure} as 

\begin{Verbatim}[fontsize=\relsize{-2}]
configure --prefix=/a/b/c
\end{Verbatim}

\subsection{Updating}

Use {\bf updatepackages()}, either for specific packages, or if no
argument is specified, then all R packages.

\end{document}
irectory, say {\bf
/a/b/c}, run {\bf configure} as 

\begin{Verbatim}[fontsize=\relsize{-2}]
configure --prefix=/a/b/c
\end{Verbatim}

\subsection{Updating}

Use {\bf updatepackages()}, either for specific packages, or if no
argument is specified, then all R packages.

\end{document}

