
\documentclass[11pt]{article}

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.0in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.1in}

\usepackage{times}
\usepackage{hyperref}
\usepackage{fancyvrb}
\usepackage{relsize}

\begin{document}

\title{A Quick, Painless Introduction to the Perl Scripting Language}

\author{Norman Matloff\\
 University of California, Davis\\
        \copyright{}2002-2005, N. Matloff}

\date{May 23, 2004}   

\maketitle
\tableofcontents{}

\newpage

\section{What Are Scripting Languages?}

Languages like C and C++ allow a programmer to write code at a very
detailed level which has good execution speed.  But in many applications
one would prefer to write at a higher level.  For example, for
text-manipulation applications, the basic unit in C/C++ is a character,
while for languages like Perl and Python the basic units are lines of
text and words within lines.  One can work with lines and words in
C/C++, but one must go to greater effort to accomplish the same thing.
C/C++ might give better speed, but if speed is not an issue, the
convenience of a scripting language is very attractive.

The term {\it scripting language} has never been formally defined, but
here are the typical characteristics:

\begin{itemize}

\item Used often for system administration and ``rapid prototyping.''

\item Very casual with regard to typing of variables, e.g. no
distinction between integer, floating-point or string variables.
Functions can return nonscalars, e.g. arrays, nonscalars can be used as
loop indexes, etc.

\item Lots of high-level operations intrinsic to the language, e.g.
stack push/pop.

\item Interpreted, rather than being compiled to the instruction set of
the host machine.

\end{itemize}

Today the most popular scripting language is probably Perl.  However,
many people, including me, strongly prefer Python, as it is much cleaner
and more elegant.

Our introduction here assumes knowledge of C/C++ programming.  There
will be a couple of places in which we describe things briefly in a UNIX
context, so some UNIX knowledge would be helpful.\footnote{But certainly
not required.  Again, Perl is used on Windows and Macintosh platforms
too, not just UNIX.}

\section{Goals of This Tutorial}

Perl is a very feature-rich language, which clearly cannot be discussed
in full detail here.  Instead, our goals here are to (a) enable the
reader to quickly become proficient at writing simple Perl programs and
(b) prepare the reader to consult full Perl books (or Perl tutorials on
the Web) for further details of whatever Perl constructs he/she needs
for a particular application.

Our approach here is different from that of most Perl books, or even
most Perl Web tutorials.  The usual approach is to painfully go over all
details from the beginning.  For example, the usual approach would be to
state all possible forms that a Perl literal can take on.  

We avoid this here.  Again, the aim is to enable the reader to quickly
acquire a Perl foundation.  He/she should then be able to delve directly
into some special topic, with little or not further learning of
foundations.

\section{A 1-Minute Introductory Example}

This program reads a text file and prints out the number of lines and
words in the file:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
# comments begin with the sharp sign

# open the file whose name is given in the first argument on the command
# line, assigning to a file handle INFILE (it is customary to choose
# all-caps names for file handles in Perl) 
open(INFILE,@ARGV[0]);  

# names of scalar variables must begin with $
$line_count = 0;
$word_count = 0;

# <> construct means read one line; undefined response signals EOF
while ($line = <INFILE>) {
   $line_count++;
   # break $line into an array of tokens separated by " ", using split()
   # (array names must begin with @)
   @words_on_this_line = split(" ",$line);
   # scalar() gives the length of any array
   $word_count += scalar(@words_on_this_line);
}  

print "the file contains ",$line_count," lines and ",
   $word_count, " words\n";
\end{Verbatim}

Note that as in C, statements in Perl end in semicolons, and blocks are
defined via braces.

\section{Variables}

\subsection{Types}

Type is not declared in Perl, but rather is inferred from a variable's
name (see below), and is only loosely adhered to.

Note that a possible value of a variable is {\bf undef} (i.e.
undefined), which may be tested for, using a call to {\bf defined()}. 

Here are the main types:

\subsubsection{Scalars}

Names of {\bf scalar} variables begin with \$.

Scalars are integers, floating-point numbers and strings.  For the most
part, no distinction is made between these.

There are various exceptions, though.  One class of exceptions involves
tests of equality or inequality.  For example, use {\bf eq} to test
equality of strings but use {\bf ==} for numbers.

\subsubsection{Arrays}

Array names begin with @.  Indices are integers beginning at 0,
and the array elements are scalars (and thus array elements begin with
\$, not @).

Arrays are referenced for the most part as in C, but in a more flexible
manner.  Their lengths are not declared, and they grow or shrink
dynamically, without ``warning,'' i.e. the programmer does not ``ask for
permission'' in growing an array.  For example, if the array {\bf \@x}
currently has 7 elements, i.e. ends at {\bf \$x[6]}, then the statement 

\begin{Verbatim}[fontsize=\relsize{-2}]
$x[7] = 12;
\end{Verbatim}

changes the array length to 8.  For that matter, we could have assigned
to element 99 instead of to element 7, resulting in an array length of
100.

The programmer can treat an array as a queue data structure, using the
Perl operations {\bf push} and {\bf shift} (usage of the latter is
especially common in the Perl idiom), or treat it as a stack by using
{\bf push} and {\bf pop}.

An array without a name is called a {\bf list}.  For example, in

\begin{Verbatim}[fontsize=\relsize{-2}]
@x = (88,12,"abc");
\end{Verbatim}

we assign the array name {\bf @x} to the list (88,12,"abc").  We will
then have {\bf \$x[0]} = 88, etc. 

One of the big uses of lists and arrays is in loops,
e.g.:\footnote{C-style {\bf for} loops can be done too.}

\begin{Verbatim}[fontsize=\relsize{-2}]
# prints out 1, 2 and 4
for $i ((1,2,4))  {
   print $i, "\n";
}
\end{Verbatim}

The length of an array or list is obtained calling {\bf scalar()},
or by simply using the array name in a scalar context.

\begin{Verbatim}[fontsize=\relsize{-2}]
$x[0] = 15;
$x[1] = 16;
$y = shift @x;  # "output" of shift is the element shifted out
print $y, "\n";  # prints 15
print $x[0], "\n";  # prints 16
push(@x,9);  # sets $x[1] to 9
print scalar(@x), "\n";  # prints 2
print @x, "\n";  # prints 169 (16 and 9 with no space)
$k = @x;
print $k, "\n";  # prints 2
@x = ();  # @x will now be empty
print scalar(@x), "\n";  # prints 0
\end{Verbatim}

\subsubsection{Hashes}

As a first look, you can think of {\bf hashes} or {\bf associative
arrays} as arrays indexed by strings instead of by integers.  Their
names begin with \%, and their elements are indexed using braces, as in

\begin{Verbatim}[fontsize=\relsize{-2}]
$h{"abc"} = 12;
$h{"defg"} = "San Francisco";
print $h{"abc"}, "\n";  # prints 12
print $h{"defg"}, "\n";  # prints "San Francisco"
\end{Verbatim}

However, a closer look at hashes reveals them to essentially be like
C {\bf struct}s.  In the above example, for instance, we have set up a
hash named \%h which is analogous to a C {\bf struct} with {\bf int} and
{\bf char []} fields, whose values here are 12 and ``San Francisco'',
respectively.  This correspondence is more clear in the equivalent (and
more commonly used) alternative code

\begin{Verbatim}[fontsize=\relsize{-2}]
%h = (abc => 12,
      defg => "San Francisco");
print $h{"abc"}, "\n";  # prints 12
print $h{"defg"}, "\n";  # prints "San Francisco"
\end{Verbatim} 

Here the first two lines look rather the declaration of a C {\bf
struct}, as in

\begin{Verbatim}[fontsize=\relsize{-2}]
struct ht {
   int abc;
   char defg[20];
};

struct ht h;

h.abc = 12;
strcpy(h.defg,"San Francisco");
\end{Verbatim}

Note, however, that there is no analog of {\bf ht} in our Perl example
above.  In fact, there are lots of other differences.  For example,
unlike C {\bf struct}s, hashes actually store their field names.  In the
example above, the number 12 and the string ``San Francisco'' are
stored, but not the field names abc and defg.  By contrast, Perl stores
both!  In the code above, if we add the line

\begin{Verbatim}[fontsize=\relsize{-2}]
print %h, "\n";
\end{Verbatim}

the output of that statement will be

\begin{Verbatim}[fontsize=\relsize{-2}]
abc12defgSan Francisco
\end{Verbatim}

\subsubsection{References}

{\bf References} are like C pointers.  They are considered scalar
variables, and thus have names beginning with \$.  They are dereferenced
by prepending the symbol for the variable type, e.g. prepending a \$ for
a scalar, a @ for an array, etc.:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
# set up a reference to a scalar
$r = \3;  # \ means "reference to," like & means "pointer to" in C
# now print it; $r is a reference to a scalar, so $$r denotes that scalar
print $$r, "\n";  # prints 3

@x = (1,2,4,8,16);
$s = \@x;
# an array element is a scalar, so prepend a $
print $$s[3], "\n";  # prints 8
# for the whole array, prepend a @
print scalar(@$s), "\n";  # prints 5
\end{Verbatim}


In Line 4, for example, you should view {\bf \$\$r} as {\bf \$(\$r)},
meaning take the reference {\bf \$r} and dereference it.  Since the
result of dereferencing is a scalar, we get another dollar sign on the
left.

\subsection{Anonymous Data}

{\bf Anonymous} data is somewhat analogous to data set up using {\bf malloc()}
in C.  One sets up a data structure without a name, and then points a
reference variable to it.

A major use of anonymous data is to set up object-oriented programming,
if you wish to use OOP.  (Covered in Section \ref{oop}.)

Anonymous arrays use brackets and braces instead of parentheses.  The
$->$ operator is used for dereferencing. 

Example:

\begin{Verbatim}[fontsize=\relsize{-2}]
# $x will be a reference to an anonymous array
$x = [5, 12, 13];
print $x->[1], "\n";  # prints 12 

# $y will be a reference to an anonymous hash (due to braces)
$y = {name => "penelope", age=>105};
print $y->{age}, "\n";  # prints 105
\end{Verbatim}

Note the difference between 

\begin{Verbatim}[fontsize=\relsize{-2}]
$x = [5, 12, 13];
\end{Verbatim}

and

\begin{Verbatim}[fontsize=\relsize{-2}]
$x = (5, 12, 13);
\end{Verbatim}

The former sets {\bf \$x} as a reference to the anonymous list
[5,12,13], while the latter sets {\bf \$x} to the length of the
anonymous list (5,12,13).  So the brackets or parentheses, as the case
may be, tell the Perl interpreter what we want.

\subsection{Declaration of Variables}

A variable need not be explicitly declared; its ``declaration'' 
consists of its first usage.  For example, if the statement

\begin{Verbatim}[fontsize=\relsize{-2}]
$x = 5;  
\end{Verbatim}

were the first reference to {\bf \$x}, then this would both declare {\bf
\$x} and assign 5 to it.

If you wish to make a separate declaration, you can do so, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
$x;
...
$x = 5;
\end{Verbatim}

If you wish to have protection against accidentally using a variable
which has not been defined, say due to a misspelling, include a line

\begin{Verbatim}[fontsize=\relsize{-2}]
use strict;
\end{Verbatim} 

at the top of your source code.

\subsection{Scope of Variables}

Variables in Perl are global by default.  To make a variable local to
subroutine or block,\footnote{This includes, for example, a block within
an {\bf if} statement.} the {\bf my} construct is used.\footnote{There
are many other scope possibilities, e.g. namespaces of packages.}  

\section{Subroutines}

\subsection{Arguments, Return Values}

Arguments for a subroutine are passed via an array {\bf @\_}.  Note once
again that the @ sign tells us this is an array; we can think of
the array name as being {\bf \_}, with the @ sign then telling us it is
an array.  

Here are some examples:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
# read in two numbers from the command line (note: the duality of
# numbers and strings in Perl means no need for atoi()!)
$x = @ARGV[0];
$y = @ARGV[1];
# call subroutine which finds the minimum and print the latter
$z = min($x,$y);
print $z, "\n";

sub min {
   if ($_[0] < $_[1]) {return $_[0];}
   else {return $_[1];}
}  
\end{Verbatim}

A common Perl idiom is to have a subroutine use {\bf shift} on @\_ to
get the arguments and assign them to local variables.

Arguments must be pass-by-value, but this small restriction is more than
compensated by the facts that (a) arguments can be references, and (b)
the return value can also be a list.

Here is an example illustrating all this:

\begin{Verbatim}[fontsize=\relsize{-2}]
$x = @ARGV[0];
$y = @ARGV[1];
($mn,$mx) = minmax($x,$y);
print $mn, " ", $mx, "\n";

sub minmax {
   $s = shift @_;  # get first argument
   $t = shift @_;  # get second argument
   if ($s < $t) {return ($s,$t);}  # return a list
   else {return ($t,$s);}
}
\end{Verbatim}

\subsection{Alternative Notation}  
\label{altnot}

Instead of enclosing arguments within parentheses, as in C, one can
simply write them in ``command-line arguments'' fashion.  For example,
the call

\begin{Verbatim}[fontsize=\relsize{-2}]
($mn,$mx) = minmax($x,$y);
\end{Verbatim}

can be written as

\begin{Verbatim}[fontsize=\relsize{-2}]
($mn,$mx) = minmax $x,$y;
\end{Verbatim}

In fact, we've been doing this in all our previous examples, in our
calls to {\bf print()}.  This style is often clearer.

On the other hand, if the subroutine, say {\bf x()}, has no arguments
make sure to use the parentheses in your call:

\begin{Verbatim}[fontsize=\relsize{-2}]
x();
\end{Verbatim}

rather than

\begin{Verbatim}[fontsize=\relsize{-2}]
x;
\end{Verbatim}

In the latter case, the Perl interpreter will treat this as the
``declaration'' of a variable {\bf x}, not a call to {\bf x()}.

\subsection{Passing Subroutines As Arguments}

Older versions of Perl required that subroutines be referenced through
an ampersand preceding the name, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
($mn,$mx) = &minmax $x,$y;
\end{Verbatim}

In some cases we must still do so, such as when we need to pass a
subroutine name to a subroutine.  The reason this need arises is that we
may write a packaged program which calls a user-written subroutine.

Here is an example of how to do it:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
sub x  {
   print "this is x\n";
}
            
sub y  {
   print "this is y\n";
}
            
sub w {
   $r = shift;
   &$r();
}
            
w \&x;  # prints "this is x"
w \&y;  # prints "this is y"
\end{Verbatim}

\section{Confusing Defaults}

In many cases, Perl the operands for operators have defaults if they are
not explicitly specified.  Within a subroutine, for example, the array
of arguments @\_, can be left implicit.  The code

\begin{Verbatim}[fontsize=\relsize{-2}]
sub uuu {
   $a = shift;  # get first argument
   ...
}
\end{Verbatim}

will have the same effect as

\begin{Verbatim}[fontsize=\relsize{-2}]
sub uuu {
   $a = shift @_;  # get first argument
   ...
}
\end{Verbatim}

This is handy for experienced Perl programmers but a source of
confusion for beginners.

Similarly,

\begin{Verbatim}[fontsize=\relsize{-2}]
$line = <>;
\end{Verbatim}

reads a line from the standard input (i.e. keyboard), just as the more
explicit

\begin{Verbatim}[fontsize=\relsize{-2}]
$line = <STDIN>;
\end{Verbatim}

would.

\section{String Manipulation in Perl}

One major category of Perl string constructs involves searching and
possibly replacing strings.  For example, the following program acts
like the UNIX {\bf grep} command, reporting all lines found in a given
file which contain a given string (the file name and the string are
given on the command line):

\begin{Verbatim}[fontsize=\relsize{-2}]
open(INFILE,@ARGV[0]);
while ($line = <INFILE>) {
   if ($line =~ /@ARGV[1]/)  {
      print $line;
   }
}
\end{Verbatim}

Here the Perl expression

\begin{Verbatim}[fontsize=\relsize{-2}]
   ($line =~ /@ARGV[1]/)  
\end{Verbatim}

checks \$line for the given string, resulting in a {\bf true} value if
the string is found.

In this string-matching operation Perl allows many different types of
{\bf regular expression} conditions.\footnote{If you are a UNIX user, you may be
used to this notion already.}  For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
open(INFILE,@ARGV[0]);
while ($line = <INFILE>) {
   if ($line =~ /us[ei]/)  {
      print $line;
   }
}
\end{Verbatim}

would print out all the lines in the file which contain {\it either} the
string ``use'' {\it or} ``usi''.

Substitution is another common operation.  For example, the code 

\begin{Verbatim}[fontsize=\relsize{-2}]
open(INFILE,@ARGV[0]);
while ($line = <INFILE>) {
   if ($line =~ s/abc/xyz/)  {
      print $line;
   }
}
\end{Verbatim}

would cull out all lines in the file which contain the string ``abc'',
replace the first instance of that string in the line by ``xyz'', and
then print out those changed lines.

There are many more string operations in the Perl repertoire.

As mentioned earlier, Perl uses {\bf eq} to test string equality; it
uses {\bf ne} to test string inequality.

\section{Perl Packages/Modules}
\label{packmod}

Object-oriented programming (OOP, see Section \ref{oop}) came late to
Perl, as an add-on, so many of the Perl programs being used in the world
do not make use of OOP.  However, you are likely to encounter it anyway,
in the form of modules which you will call from your own code, even
though the latter may not be OOP in nature.

For example, if you do network programming (see Section \ref{net}), you
will probably need to include a line

\begin{Verbatim}[fontsize=\relsize{-2}]
use IO::Socket;
\end{Verbatim}

in your code.  Let's look at this closely.

First, part of your Perl programs environment is the Perl search path,
in which the interpreter looks for packages that your code uses.  This
path has a default value, but you can change it by using the {\bf -I}
option when you invoke Perl on the command line.

In the above example, the interpreter will look in your search path for
a directory {\bf IO}.  At that point, the interpreter will consider two
possibilities:\footnote{If you know Java, you may notice that this is
similar to the setup for Java packages.}  

\begin{itemize}

\item there is a file {\bf IO/Socket.pm} where the package code resides,
or

\item there is a directory {\bf IO/Socket.pm}, within which there are
various {\bf .pm} files which contain the package code

\end{itemize}

In our case here, it will be the latter situation.  For example, on my
Linux machine, the directory {\bf /usr/lib/perl5/5.8.0/IO/Socket}
contains the files {\bf INET.pm} and {\bf UNIX.pm}, and the socket code
is in those files.

Each package is typically in a separate file.  The {\bf package} keyword
begins the file.  For use as a module, the file name should begin with a
capital letter.  For instance, in our example above, the first
non-comment line of {\bf INET.pm} is

\begin{Verbatim}[fontsize=\relsize{-2}]
package IO::Socket::INET;
\end{Verbatim}

Any package which contains subroutines must return a value.  Typically
one just includes a line

\begin{Verbatim}[fontsize=\relsize{-2}]
1;
\end{Verbatim}

at the very end, which produces a dummy return value of 1.

There are many public-domain Perl modules available in CPAN, the
Comprehensive Perl Archive Network, which is available at several sites
on the Web.  Moreover, the process of downloading and installing them
has been automated!

For example, suppose you wish to write (or even just run) Perl programs
with Tk-based GUIs.  If the Perl Tk module is not already on your
machine, just type

\begin{Verbatim}[fontsize=\relsize{-2}]
perl -MCPAN -e "install Tk"
\end{Verbatim}

You will be asked some questions, but just taking the defaults will
probably be enough.


\end{document}

