

\documentstyle{article}

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.4in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.12in}

\begin{document}

\newcommand{\bfs}[1]{{\bf #1}}
\newcommand{\lnum}[1]{{\medskip \bf Line #1:}}
\newcommand{\lnums}[2]{{\medskip \bf Lines #1-#2:}}

\begin{center}
{\LARGE\bf 

Introductory C Program

}
\end{center}

\bigskip


The program below will serve as our introduction to C.  It implements
a subset of the Unix \bfs{wc} command, which reports the number of 
characters, words and lines in a text file.\footnote{Type {\bf man wc} to 
get a fuller account of this Unix command if you are curious.}

Since we have not covered the material on file manipulation in C yet,
we will read from what in Unix is called the {\bf standard input}.
By default, this means input from the keyboard, but by using the
Unix input-redirection symbol, $<$, we can have a file playing the
role of the standard input, and thus our \bfs{wc} program will be able
to read from real files (more on this below).

\section{Program Listing}

The {\bf source file}, i.e. the C-language file (as opposed to the
machine-language file which the compiler generates from this file)
for the program is as follows (note of course that the line numbers
have been added here for clarity,\footnote{The Unix \bfs{cat} command,
with the -n option, was used to get the line numbers.} and were not part 
of the source file):

\begin{verbatim}
  1	
  2	/* introductory C program 
  3	
  4	   implements (a subset of) the Unix wc command  --  reports character, 
  5	   word and line counts; in this version, the "file" is read from the 
  6	   standard input, since we have not covered C file manipulation yet, 
  7	   so that we read a real file can be read by using the Unix `<'
  8	   redirection feature */
  9	
 10	
 11	#define MaxLine 200  
 12	
 13	
 14	char Line[MaxLine];  /* one line from the file */
 15	
 16	
 17	int NChars = 0,  /* number of characters seen so far */
 18	    NWords = 0,  /* number of words seen so far */
 19	    NLines = 0,  /* number of lines seen so far */
 20	    LineLength;  /* length of the current line */ 
 21	
 22	
 23	PrintLine()  /* for debugging purposes only */
 24	
 25	{  int I;
 26	
 27	   for (I = 0; I < LineLength; I++) printf("%c",Line[I]);
 28	   printf("\n");
 29	}
 30	
 31	
 32	int WordCount()
 33	
 34	/* counts the number of words in the current line, which will be taken
 35	   to be the number of blanks in the line, plus 1 (except in the case 
 36	   in which the line is empty, i.e. consists only of the end-of-line 
 37	   character); this definition is not completely general, and will be
 38	   refined in another version of this function later on */
 39	
 40	{  int I,NBlanks = 0;  
 41	
 42	   for (I = 0; I < LineLength; I++)  
 43	   if (Line[I] == ' ') NBlanks++;
 44	
 45	   if (LineLength > 1) return NBlanks+1;
 46	   else return 0;
 47	}
 48	
 49	
 50	int ReadLine()
 51	
 52	/* reads one line of the file, returning also the number of characters
 53	   read (including the end-of-line character); that number will be 0
 54	   if the end of the file was reached */
 55	
 56	{  char C;  int I;
 57	
 58	   if (scanf("%c",&C) == -1) return 0;
 59	   Line[0] = C;
 60	   if (C == '\n') return 1; 
 61	   for (I = 1; ; I++) {
 62	      scanf("%c",&C);     
 63	      Line[I] = C;  
 64	      if (C == '\n') return I+1;
 65	   }  
 66	}
 67	
 68	
 69	UpdateCounts()
 70	
 71	{  NChars += LineLength;
 72	   NWords += WordCount();
 73	   NLines++;
 74	}
 75	 
 76	 
 77	main()  
 78	
 79	{  while (1)  {
 80	      LineLength = ReadLine();
 81	      if (LineLength == 0) break;
 82	      UpdateCounts();
 83	   }
 84	   printf("%d %d %d\n",NChars,NWords,NLines);
 85	}
 86	
\end{verbatim}

\medskip


\section{Compiling, Running and Testing the Program} 

To compile the program, whose source file I named WC.c, I gave
the command 

\begin{verbatim}
cc -g WC.c 
\end{verbatim} 

This produces a file a.out, which is the machine-language, executable 
file.  The -g option tells the compiler not to discard the {\bf symbol 
table}, i.e. the variable names; these now will be retained in the 
a.out file, which will make debugging much easier, as we will see later.

To execute the program, just type 

\begin{verbatim}
a.out 
\end{verbatim} 

(if inputting from the keyboard) or 

\begin{verbatim} 
a.out < filename 
\end{verbatim} 

(if inputting from some file).  In the latter case, we are tricking
the program; it thinks it is reading from the keyboard, but we are
arranging things so that it reads from the given file instead.\footnote{In
a similar manner, we can redirect the output to a file.  The program
here thinks it is writing to our terminal screen, but if we give the
command

\medskip
a.out $<$ filename1 $>$ filename2 
\medskip

then the output will go to filename2 instead.}

To test the program, I typed 

\begin{verbatim} 
a.out < z
\end{verbatim} 

taking input from the file z.  The latter, which I created using the 
\bfs{emacs} editor, consisted first of a line ``ABC'', then a line 
``DE F'', then an empty line, then a line ``G''.  I can use \bfs{cat} 
to get a quick look at the file:

\begin{verbatim}
heather% cat z
ABC
DE F

G
\end{verbatim}

This is a total of 12 characters\protect\footnote{This includes
the end-of-line characters, one for each line.  We can check this by
typing 

\medskip
ls -l z 
\medskip

(type \bfs{man ls} for more information)},
four ``words,'' and four lines.  Sure enough, my test worked
(after some debugging first!):

\begin{verbatim}
heather% a.out < z
12 4 4
\end{verbatim}

\section{Analysis of the Program}

As in writing a program, one should read someone else's program in a
top-down manner, by first taking a look at the program's global variables,
and then reading the main program.  Let's do so:

{\bf Declarations of Global Variables, Lines 14-20:}

On Line 14, we see an array named Line.  It is an array of characters.
We will be reading one line of the input file at a time, storing the
current line in this array Line.  The length of that line will be
stored in an integer variable LineLength (Line 20).

Since we are supposed to be counting the total numbers of characters,
words and lines in the file, we have variables set up to do that
(Lines 17-19).

{\bf Main Program, Lines 77-85:}  

Every C program is required to have a function main(), which serves as
the analog of the main program in Pascal, i.e. execution starts here.

The bulk of the function main() in this example consists of a \bfs{while}
loop, in Lines 79-83.  C-language \bfs{while} loops are very similar to
those in Pascal, but differ in some details.

First of all, in C there is no formal data type corresponding to Pascal's
\bfs{boolean}.  Instead, any nonzero value is treated as `true', while
zero is considered `false'.  Thus in Line 79 we have something which would
correspond to ``while (true)'' in Pascal.

Second, the C analogs of Pascal's \bfs{begin} and \bfs{end} are the
characters `{' and `}', i.e. left- and right-braces.  So, the loop
extends from Line 79 to Line 83, as indicated earlier.

Third, C has a leave-the-loop statement, \bfs{break}.  Thus in this example,
the point at which the loop is exited is Line 81.

On Line 80 we have a call to a function ReadLine().  As its name implies,
it will read a line from the input file; it will also return the number of
characters in that line (as stated in the comments at the definition of
the function, Lines 52-54); we are recording this returned value in the
variable LineLength.

The on Line 81, we are asking whether the line was empty, implying that
we have reached the end of the file, in which case we will leave the loop.
But otherwise we will (on Line 82) make a call to a function UpdateCounts(),
which as its name implies, will update the number of characters, words and
lines we have seen so far.\footnote{Note the phrasing here, ``make a call
to a function.''  We did not say ``make a call to a procedure.''  In C,
all subprograms---and for that matter, main() too---are called functions,
whether they return values or not.}

Note very, \underline{VERY} carefully that in C, tests of equality are
made with the symbol ==, not =.  If for example we had written Line 81
as 

\begin{verbatim}
if (LineLength = 0) break;
\end{verbatim}

this would have been legal, and accepted by the compiler, but would have
produced very wrong results:  The value 0 would be assigned to LineLength,
resulting in the whole expression being 0, thus being considered `false',
and the break would never be executed; it would be an infinite loop.

So, the \bfs{while} loop is pretty simple:  Read a line, update counts,
read a line, update counts, ..., until you reach the end of the file.

On Line 84, we print out the results, by making a call to printf(), a
function in the C library.  It is very similar to the Pascal write()
procedure, but in C there is more flexibility, and consequently more
details to attend to.  

For example, one must state the \bfs{format} which is to be used in
printing out each item.  Here we have specified the \%d (``decimal'')
format, used to print out items in integer form, in printing out all
three variables.  We have also asked for exactly one space between
each pair of consecutive items, and for a new-line character $\backslash$n
to be written after the third item, forcing the cursor to go to the next
line on the screen.

We will note in passing that even though printf() is a library function,
i.e. a function that we didn't write ourselves, it \underline{is} of
course a function, and it has parameters, in this particular instance
four parameters---a string (the quoted part) and three integers (NChars,
NWords, NLines).  This was true in Pascal too---Pascal's write() 
\underline{is} a procedure, with parameters---but you will find that in
C one needs to pay more attention to such things, so we have mentioned
it here.

Now continuing in a top-down fashion, let's look at the other functions,
and the other miscellaneous components of the program: 

\lnums{2}{8}  This is a comment.  As you can see, comments in C are
delineated by /* and */.

\lnum{11}  C's \bfs{define} is like Pascal's \bfs{const}, but much more
powerful, as we will see later.

\lnum{14}  This declares an array of MaxLine characters.  In C,
array subscripts start at 0, so this declaration here would be
\bfs{Line: array[0..MaxLine-1] of char} in Pascal.

The type \bfs{char} in C is treated as a special case of the
integer type, \bfs{int}.  Thus, since integers can be compared,
e.g. in

\begin{verbatim}
int X,Y;
...
...
if (X < Y) ...
\end{verbatim}

then so can characters, e.g. in

\begin{verbatim}
char C;
...
...
if (C > `g') ...
\end{verbatim}

\lnums{17}{19}  One can initialize variables in C 
declarations.\footnote{Concerning variables whose initial values you wish
to be 0:  Most compilers will automatically set all bits of a variable to
0 if you don't specify otherwise, and thus if you wish the initial value of,
say, an \bfs{int} variable to be 0, this will be done automatically.
However, it is good practice to explicitly initialize such variables
to 0, for two reasons.  First, the compiler might not do so, and this
may produce a hard-to-find bug.  Secondly, it makes your program clearer.}

\lnums{23}{29}  Here the function PrintLine() is being defined.  It has
no parameters, but the parentheses are required anyway.  

\lnum{25} The integer variable I is declared here as a local variable.

\lnum{27}  Here is a \bfs{for} loop.  In C, \bfs{for} loops are like those
in Pascal, but are much more powerful.

In this loop, the three fields say that:  I will be initialized to 0 (note
that in C the assignment operator is =, not :=);
the loop will iterate until the condition I $<$ LineLength is violated;
and the variable I will be incremented by 1 (``++'') at the end of each
iteration.\footnote{Instead of I++ we could use I = I + 1.  This would
work, but may be less efficient, depending on the machine and the compiler
used.  Constructs like this were included in the original design of the C
language so as to give hints to compiler, so that the latter could produce
more efficient machine code.  Of course, it also makes typing easier for
the programmer.}

The body of the loop here consists of a single statement, a call to
the printf() function.  Here we are printing to the screen the Ith
character of our input line, doing the printing using the character format
\%c.

The second field in a \bfs{for} loop can include any boolean expression;
e.g. it could be I $<$ LineLength \&\& J $>$ 12, where \&\& is
the analog of Pascal's \bfs{and} operator.

\lnum{32}  This is the start of the function WordCount().  It counts
the number of words in the current line, and returns that value.  The
type of the returned value is integer.\footnote{As mentioned earlier,
all C subprograms are called ``functions,'' whether or not they return
values.  If we don't state the type of the return value, it is by
default of type integer.  We can, however, declare the return value
to be of type \bfs{void}, for the sake of clarity; it is a way of
saying explicitly that the function is not intended to return a value.} 

\lnum{40}  Local variables.

\lnums{42}{43}  Here we are counting the number of blank characters.

\lnum{43}  Note again that in C we use a single = for assignment statements,
but a double one == for testing equality.  Use of the former when the
latter is needed is a common mistake for learners of C.

Note the use of the increment operator ++ again.

\lnums{45}{46}  C's if-then-else construct takes the form

\begin{verbatim}
if (boolean expression) statement1
else statement2
\end{verbatim}

In C, all statements must be terminated by a semicolon (except compound
statements, discussed below).  So, the semicolon in Line 45 is part
of the ``statement1'' here, not part of the if-then-else itself.

The \bfs{return} construct is what is used to give the function its return
value.  By contrast, in Pascal, the analog of return 0 in Line 50
would be WordCount:=0\footnote{\bfs{return} can always be used without
a value; this results in leaving the function and going back to the
point of the call, but without returning a value.  This is useful in
functions which are like Pascal procedures instead of Pascal functions.
You may also find it useful to quit a program at some line in the
middle of the source code, instead of at the last line; you can call
the \bfs{exit()} function for this.}.

\lnum{50}  Here is the ReadLine() function.  Note again that it returns
an integer value, so we have written ``int'' on this line, just before
the function name.

\lnum{58}  The function scanf() is like Pascal's read().  Again, it
requires that formats be specified, as with printf().  We are reading
a single character (note the \%c format) into the variable C.  But why
do we need the ampersand (\&) in front of the C?  Here is the reason:

As mentioned earlier for printf(), the function scanf() does have 
parameters, in this instance two.  Recall that in Pascal there are two
types of parameters, pass-by-value and pass-by-reference.  The latter
type is denoted by the keyword \bfs{var}, and is used for any parameter
whose value will be changed by the subprogram.  The situation in C is
somewhat similar, in that the two cases must be distinguished, and in
the pass-by-reference case we must take special action.  That special
action is to write the ampersand, as we have done here for the parameter
C (note that C does get changed by the function; whatever value it had
before the call will now be replaced by the new character just read in).

Pass-by-reference is a bit more delicate in C than in Pascal, and thus
here in this introductory C program we have avoided using parameters
except where absolutely necessary, i.e. except in the calls to printf()
and scanf().  We will see how to deal with parameters in a later unit.

On Line 58, we are also taking advantage of the fact that, like many of
the C library functions, scanf() returns a value which serves as an
error code.  If the value returned is -1, for example, this means that
the attempt to read failed because we reached the end of the file; we
check for that here.  On Line 62, though, we don't need to make this
check, since we are in the middle of reading a line there (we are
counting on there being a newline character), so we just ignore the
value returned by scanf() here.

On Line 61, note that the second of the three fields in the \bfs{for}
statement is blank.  This illustrates the flexibility of C's \bfs{for}
relative to Pascal's.  On the other hand, it is somewhat dangerous in
this case, if for example a line's length exceeds MaxLine, the length
of our array Line.\footnote{It is important to note that most C compilers
will not produce code to check for ``array index out of bounds'' errors
like this.  The program will \underline{not} be killed if this occurs,
and erratic results may occur.}  

\lnums{71}{72}
Here we are using the += operator.  Line 71, for example, is functionally
equivalent to

\begin{verbatim}
NChars = NChars + LineLength;
\end{verbatim}

However, as with ++, the += operator is used to give a hint to the
compiler; depending on the machine, the compiler may be make use of
the knowledge that NChars is both a source and a destination operand
here, and thus produce more efficient machine code.


\bigskip
{\bf Remarks on Style:}
I have definitely used a {\bf top-down} design here, keeping all
functions very short, and having the functions themselves contain a 
number of calls to further functions, with meaningful names, so 
that one can glance through a function and get a quick overview of 
what any module does.  

Similarly, I have used indenting to clarify the program.  Lines
61-65 exemplify this, with the body of the \bfs{for} loop being
indented.  Also, I have followed the usual C convention of using
braces \{  \} in a ``triangular'' form, with the ``for'', the \{ and
the \} forming a triangle:

\begin{verbatim}
for (...)  {
   ...
   ...
}
\end{verbatim}

And of course I have included lots of comments, especially to describe
what roles the variables play (see the comments in Lines 17-20).

Employ these devices---top-down style, indenting and comments---in
all your programs, {\bf FROM THE MOMENT YOU START WRITING THEM, NOT JUST
WHEN YOU ARE DONE WRITING AND DEBUGGING!}  You will save yourself a lot of
time, both in writing and in debugging.  Use of top-down style is especially
important.  It will help your thinking process tremendously during the time
you are writing the program.  You may have been told in the past that this
is to help {\it other} people, i.e. to help other people read your program;
that is true, but it is also to help \underline{yourself}!  It \underline{will}
save you time!

\section{A More Sophisticated Function WordCount()}

The WC program used above as an introduction to C is not fully general. 
It does not cover the case in which two words are separated by two or more
blanks, or the case in which a blank begins or ends a line.

Below is another version of the function WordCount() from that program.
It is more general, covering these exceptional cases.  The strategy is
outlined in the comments:  We keep looping, alternately skipping over
blanks until a word is found, recording that word in our word count, 
and then skipping over that word.

The version of WordCount() here is important because it uses \bfs{for} 
and \bfs{while} loops in a more sophisticated fashion.  Look at Line
20, for example:

\begin{verbatim} 
   for (I = 0; I <= LLMinus2; ) {
\end{verbatim}

Notice the third field, which normally would contain something like
\bfs{I++}, is blank; in other words, this field says, ``At the end of
each loop iteration, do nothing,'' as opposed to, say, ``At the end of
each loop, increment I.''  Instead, the incrementing of I is done within
the loop itself, at Lines 22 and 30, and actually may be incremented
several times within \underline{one} iteration of the loop.  So here
is an example of how C's \bfs{for} loops are more flexible/powerful
than Pascal's.

The two \bfs{while} loops here are like Pascal's, but pay attention to
the notation in Line 30:  ``!='' means ``not equal to,'' like Pascal's
`$<>$.  ``\&\&'' means ``and.''

\begin{verbatim} 
     1	
     2	int WordCount()
     3	
     4	{  int I,
     5	       LLMinus2,  /* position of the character just before the
     6	                     end-of-line character */
     7	       Count;  /* number of words we have encountered so far in
     8	                  this line */
     9	
    10	   /* if the line is empty, i.e. consists only of the end-of-line
    11	      character, then there are no words in this line */
    12	   if (LineLength == 1) return 0;
    13	
    14	   Count = 0;
    15	   LLMinus2 = LineLength - 2;
    16	
    17	   for (I = 0; I <= LLMinus2; ) {
    18	      /* scan until reach nonblank */
    19	      while (Line[I] == ' ') I++;
    20	      /* if not yet at end of line, we have found a word;
    21	         otherwise leave */
    22	      if (I <= LLMinus2) Count++; else break;
    23	      /* scan through this word, until we get past it; at that
    24	         time we will either be at a blank or the end of the
    25	         line; in the latter case we will leave, but otherwise
    26	         will continue with the loop */
    27	      while (Line[I] != ' ' && I <= LLMinus2) I++;
    28	   }
    29	
    30	   return Count;
    31	}
    32	
\end{verbatim} 

\end{document}





