
\documentclass[twocolumn]{article}

\setlength{\oddsidemargin}{-0.5in}
\setlength{\evensidemargin}{-0.5in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{7.0in}
\setlength{\textheight}{9.5in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.05in}
\setlength{\columnseprule}{0.3pt}
\usepackage{fancyvrb}
\usepackage{relsize}
\usepackage{hyperref}

\usepackage{listings}

\begin{document}

Name: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

Directions: {\bf Work only on this sheet} (on both sides, if needed); do
not turn in any supplementary sheets of paper. There is actually plenty
of room for your answers, as long as you organize yourself BEFORE
starting writing.

{\bf \large 10-POINT BONUS:}  See instructions on the blackboard.

You will write an R class, {\bf "tfile"}, whose basis is similar to the
contents of our handout {\bf text.R}.  

An object of this class will consist of three components:

\begin{itemize}

\item {\bf story}, a vector of character strings, consisting of the
words in the input file, in the same sequence as the file

\item {\bf distinctwords}, a vector of character strings, consisting of
all the distinct words in the file

\item {\bf places}, an R list, one element per distinct word in the
file, with that element being a vector of integers showing where the
word occurs in the file

\end{itemize}

You will write three functions:

\begin{itemize}

\item {\bf newtfile():}  the constructor, with the input file name as argument 

\item {\bf places():}  argument is an object of class {\bf "tfile"},
output is an R list as described in the {\bf places} component above; in
fact, you must have your {\bf newtfile()} call this function

\item {\bf ctxt():}  prints the context of any word in a {\bf "tfile"}
object, printing out (on the same line) the given word and the words
immediately preceding and following it in all cases in which the given
word appears (if you wish, you can be sloppy and assume you'll never
have a case at one end or the other of the file, which is why I got NAs
below)

\end{itemize}

For convenience, assume that the class is case-insensitive and that
there is no punctuation in the file.  {\bf For full credit, your code
must be loop-free.}

For instance, say the input file {\bf infile} consists of

\begin{Verbatim}[fontsize=\relsize{-2}]
how much wood
could a woodchuck chuck
if a woodchuck could chuck wood
\end{Verbatim}

Here is usage on that file:

\begin{lstlisting}
> howmuch <- newtfile("infile")
> howmuch$story
 [1] "how"       "much"      "wood"      "could"     "a"         "woodchuck"
 [7] "chuck"     "if"        "a"         "woodchuck" "could"     "chuck"    
[13] "wood"     
> howmuch$distinctwords
[1] "how"       "much"      "wood"      "could"     "a"         "woodchuck"
[7] "chuck"     "if"       
> howmuch$places[["could"]]
[1]  4 11
> ctxt(howmuch,"wood")
[1] "much wood could"
[1] "chuck wood NA"
\end{lstlisting}

You may find the R function {\bf unique()} useful:

\begin{lstlisting}
> unique(c(5,12,13,12,5))
[1]  5 12 13
\end{lstlisting}

\onecolumn

{\bf Solutions:}

\begin{lstlisting}
# constructor
newtfile <- function(tfilename) {
   tmp <- list()
   # vector of all the words, in sequence of the original text
   tmp$story <- scan(tfilename,"")  
   # vector of the distinct words in the text
   tmp$distinctwords <- unique(tmp$story)
   # for each word, its various positions within the text
   tmp$places <- places(tmp)
   class(tmp) <- "tfile"
   return(tmp)
}

places <- function(tfileobj) {
   s <- tfileobj$story
   split(1:length(s),s)
}

# prints the context of any word, printing the word before and the word after 
# the given one; prints on the same line; assumes not at an end
ctxt <- function(tfileobj,word) {
   pts <- tfileobj$places[[word]]
   story <- tfileobj$story
   pct <- function(i) {
      print(story[(i-1):(i+1)])
   }
   lapply(pts,pct)
}
\end{lstlisting}

\end{document}



