\chapter{Introduction to Linux Intel Assembly Language}
\label{chap:asm}

\section{Overview of Intel CPUs}

\subsection{Computer Organization}

Computer programs execute in the computer's {\bf central processing
unit} (CPU).  Examples of CPUs are the Pentiums in PCs and the PowerPCs
in Macs.\footnote{Slated to be replaced by Intel chips.}  The program
itself, consisting of both instructions and data are stored in {\bf
memory} (RAM) during execution.  If you created the program by compiling
a C/C++ source file (as opposed to writing in assembly language
directly), the instructions are the machine language operations (add,
subtract, copy, etc.) generated from your C/C++ code, while the data are
your C/C++ variables.  The CPU must fetch instructions and data from
memory (as well as store data to memory) when needed; this is done via a
{\bf bus}, a set of parallel wires connecting the CPU to memory. 

The components within the CPU include {\bf registers}.  A register
consists of bits, with as many bits as the word size of the machine.
Thus a register is similar to one word of memory, but with the key
difference that a register is inside the CPU, making it much faster to
access than memory.  Accordingly, when programming in assembly language,
we try to store as many of our variables as possible in registers instead
of in memory.  Similarly, if one invokes a compiler with an ``optimize''
command, the compiler will also attempt to store variables in registers
instead of in memory (and will try other speedup tricks as well).

\subsection{CPU Architecture}

There are many different different types of CPU chips.  The most
commonly-known to the public is the Intel Pentium, but other very common
ones are the PowerPC chip used in Macintosh computers and IBM UNIX
workstations, the MIPS chip used in SGI workstations and so on.

We speak of the {\bf architecture} of a CPU.  This means its instruction
set and registers, the latter being units of bit storage similar
memory words but inside the CPU.  

It is highly important that you keep at the forefront of your mind that
the term {\bf machine language} does not connote ``yet another
programming language'' like C or C++; instead, it refers to code
consisting of instructions for a specific CPU type.  A program in Intel
machine language will be rejected as nonsense if one attempts to run it
on, say, a MIPS machine.  For instance, on Intel machines,
0111010111111000 means to jump back 8 bytes, while on MIPS it would
either mean something completely different or would mean nothing at all.

\subsection{The Intel Architecture}
\label{intelarch}

Intel CPUs can run in several different modes.  On Linux the CPU runs in
{\bf protected, flat 32-bit mode}, or the same for 64 bits on machines
with such CPUs.  For our purposes here, we will not need to define the
terms {\it protected} and {\it flat}, but do keep in mind that in this
mode word size is 32 bits.

{\fbox {\parbox{6.5in}{

{\bf For convenience, we will assume from this point onward that we are
discussing 32-bit CPUs.}  

}
}}

Intel instructions are of variable length.  Some are only one byte long,
some two bytes and so on.

The main registers of interest to us here are named EAX, EBX, ECX, EDX,
ESI, EDI, EBP and ESP.\footnote{We will use the all-caps notation, EAX,
EBX, etc. to discuss registers in the text, but in program code
write them as {\bf \%eax}, {\bf \%ebx}, ...

Similarly, we will use all-caps notation when referring to instruction
families, such as the MOV family, even though in program code they
appear as {\bf \%movl}, {\bf \%movb}, etc.}  

These are all 32-bit registers, but the registers EAX through EDX are
also accessible in smaller portions.  The less-significant 16 bit
portion of EAX is called AX, and the high and low 8 bits of AX are
called AH and AL, respectively.  Similar statements hold for EBX, ECX
and EDX.

In the 64-bit chips, the names become RAX, RBX and so on.

Note that EBP and ESP may not be appropriate for general use in a given
program.  In particular, we should avoid use of ESP in programs with
subroutines, for reasons to be explained in Chapter \ref{chap:sub}.  If
we are writing some assembly language which is to be combined with some
C/C++ code, we won't be able to use EBP either.  

Also, there is another register, the PC, which for Intel is named EIP
(Extended Instruction Pointer).  We can never use it to store data,
since it is used to specify which instruction is currently executing.
Similarly, the flags register EFLAGS, to be discussed below, is reserved
as well.

\section{What Is Assembly Language?}

Machine-language is consists of long strings of 0s and 1s.  Programming
at that level would be extremely tedious, so the idea of assembly
language was invented.  Just as hex notation is a set of abbreviations
for various bit strings in general, assembly language is another set of
abbreviations for various bit strings which show up in machine language.

For example, in Intel machine language, the bit string
01100110100111000011 codes the operation in which the contents of the AX
register are copied to the BX register.  Assembly language notation is
much clearer:

\begin{Verbatim}[fontsize=\relsize{-2}]
mov %ax, %bx
\end{Verbatim}

The abbreviation ``mov'' stands for ``move,'' which actually means ``copy.''

An {\bf assembler} is a program which translates assembly language to
machine language.  Unix custom is that assembly-language file names end
with a {\bf .s} suffix.  So, the assembler will input a file, say {\bf
x.s}, written by the programmer, and produce from it an {\bf object
file} named {\bf x.o}.  The latter consists of the compiled machine
language, e.g. the bit string 0111010111111000 we mentioned earlier for
a jump-back-8-bytes instruction.  In the Windows world, assembly
language soruce code file names end with {\bf .asm}, from which the
assembler produces machine code files with names ending with {\bf .obj}.

You probably noticed that an assembler is similar to a compiler.  This
is true in some respects, but somewhat misleading.  If we program in
assembly language, we are specifying exactly what machine-language
instructions we want, whereas if we program in, say, C/C++, we let the
compiler choose what machine-language instructions to produce.  

For instance, in the case of the assembly-language line above, we know
for sure which machine instruction will be generated by the
assembler---we know it will be a 16-bit MOV instruction, that the
registers AX and BX will be used, etc.  And we know a single machine
instruction will be produced.  By contrast, with the C statement

 \begin{Verbatim}[fontsize=\relsize{-2}]
z = x + y;
\end{Verbatim}

we don't know which---or even how many---machine instructions will be
generated by the compiler.  In fact, different compilers might generate
different sets of instructions.

\section{Different Assemblers}

Our emphasis will be on the GNU assembler AS, whose executable file is
named {\bf as}.  This is part of the GCC package. Its syntax is commonly
referred to as the {}``AT\&T syntax,{}'' alluding to Unix's AT\&T Bell
Labs origins.

However, we will also be occasionally referring to another commonly-used
assembler, NASM. It uses Intel's syntax, which is similar to that of
\textbf{as} but does differ in some ways. For example, for two-operand
instructions, \textbf{as} and NASM have us specify the operands in
orders which are the reverse of each other, as you will see below.

It is very important to note, though, that the two assemblers will
produce the same machine code. Unlike a compiler, whose output is
unpredictable, we know ahead of time what machine code an assembler will
produce, because the assembly-language \textbf{mnemonics} are merely
handy abbreviations for specific machine-language bit fields.

Suppose for instance we wish to copy the contents of the AX register to the
BX register. In \textbf{as} we would write

\begin{Verbatim}[fontsize=\relsize{-2}]
mov %ax,%bx
\end{Verbatim}

while in NASM it would be

\begin{Verbatim}[fontsize=\relsize{-2}]
mov bx,ax
\end{Verbatim}

but the same machine-language will be produced in both cases,
01100110100111000011, as mentioned earlier.

\section{Sample Program}
\label{sample1}

In this very simple example, we find the sum of the elements in a 4-word
array, {\bf x}.

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
# introductory example; finds the sum of the elements of an array
 
.data  # start of data section

x:    
      .long   1
      .long   5
      .long   2
      .long   18

sum:
      .long 0

.text  # start of code section

.globl _start
_start:
      movl $4, %eax  # EAX will serve as a counter for 
                     # the number of words left to be summed 
      movl $0, %ebx  # EBX will store the sum
      movl $x, %ecx  # ECX will point to the current 
                     # element to be summed
top:  addl (%ecx), %ebx
      addl $4, %ecx  # move pointer to next element
      decl %eax  # decrement counter
      jnz top  # if counter not 0, then loop again
done: movl %ebx, sum  # done, store result in "sum"
\end{Verbatim}

\subsection{Analysis}
\label{analysis}

A source file is divided into {\bf data} and {\bf text} sections, which
contain data and machine instructions, respectively.  Data consists of
variables, as you are accustomed to using in C/C++.

First, we have the line

\begin{Verbatim}[fontsize=\relsize{-2}]
.data   # start of data section
\end{Verbatim}

The fact that this begins with `.' signals the assembler that this will
be a \textbf{directive} (also known as a {\bf pseudoinstruction}),
meaning a command to the assembler rather than something the assembler
will translate into a machine instruction.  It is rather analogous to a
note one might put in the margin, say ``Please double-space here,'' of a
handwritten draft to be given to a secretary for typing; one does NOT
want the secretary to type ``Please double-space here,'' but one does
want the secretary to take some action there.

This directive here is indicating that what follows will be data rather
than code.

The \# character means that it and the remainder of the line are to be
treated as a comment. 

Next,

\begin{Verbatim}[fontsize=\relsize{-2}]
x:

      .long   1
      .long   5
      .long   2
      .long   18
\end{Verbatim}

tells the assembler to make a note in {\bf x.o} saying that later, when
the operating system (OS) loads this program into memory for execution,
the OS should set up four consecutive``long'' areas in memory, set with
initial contents 1, 5, 2 and 18 (decimal).  ``Long'' means 32-bit size,
so we are asking the assembler to arrange for four words to be set up in
memory, with contents 1, 5, 2 and 18.\footnote{The term \textbf{long}
here is a historical vestige from the old days of 16-bit Intel CPUs.
A 32-bit word size, for instance, is ``long'' in comparison to the old 16-bit
size.  Note that in the Intel syntax the corresponding term is
\textbf{double}.  } Moreover, we are telling the assembler that in our
assembly code below, the first of these four long words will be referred
to as {\bf x}. We say that {\bf x} is a \textbf{label} for this
word.\footnote{Note that {\bf x} is simply a name for the first word in
the array, not the set of 4 words.  Knowing this, you should now have
some insight into why in C or C++, an array name is synonymous with a
pointer to the first element of the array.  

Keep in mind that whenever we are referring to {\bf x}, both here in the
program and also when we run the program via a debugger (see below),
{\bf x}
will never refer to the entire array; it simply is a name for that first
word in the array.  Remember, we are working at the machine level, and
there is no such thing as a data type here, thus no such thing as an
array!  Arrays exist only in our imaginations.}  Similarly,
immediately following those four long words in memory will be a long
word which we will refer to in our assembly code below as {\bf sum}.

By the way, what if {\bf x} had been an array of 1,000 long words instead of
four, with all words to be initialized to, say, 8?  Would we need 1,000
lines?  No, we could do it this way:

\begin{Verbatim}[fontsize=\relsize{-2}]
x:
      .rept 1000
      .long 8
      .endr
\end{Verbatim}

The {\bf .rept} directive tells the assembler to act as if the lines following
{\bf .rept}, up to the one just before {\bf .endr}, are repeated the specified
number of times.

What about an array of characters?  We can ask the assembler to leave
space for this using the {\bf .space} directive, e.g. for a 6-character array:

\begin{Verbatim}[fontsize=\relsize{-2}]
y:  .space 6  # reserve 6 bytes of space
\end{Verbatim}

Or if we wish to have that space initialized to some string:

\begin{Verbatim}[fontsize=\relsize{-2}]
y:  .string "hello"
\end{Verbatim}

This will take up six bytes, including a null byte at the end; the
developers of the {\bf .string} directive decided on such a policy in
order to be consistent with C.  Note carefully, though, that they did
NOT have to do this; there is nothing sacrosanct about having null bytes
at the ends of character strings.

\checkpoint

Getting back to our example here, we next have a directive signalling
the start of the \textbf{text} section, meaning actual program code.
Look at the first two lines:

\begin{Verbatim}[fontsize=\relsize{-2}]
_start:
      movl $4, %eax
\end{Verbatim}

Here {\bf \_start} is another label, in this case for the location in memory
at which execution of the program is to begin, called the {\bf entry
point}, in this case that \textbf{movl} instruction. We did not choose
the name for this label arbitrarily, in contrast to all the others; the
Unix linker takes this as the default.\footnote{The previous directive,
{\bf .globl}, was needed for the linker too.  More on this in Chapter
\ref{chap:sub}.}

The \textbf{movl} instruction copies the constant 4 to the EAX
register.\footnote{We will usually use the present tense in remarks like
this, but it should be kept in mind that the action will not actually
occur until the program is executed.  So, a more precise though rather
unwieldy phrasing would be, {}``When it is later executed, the
\textbf{movl} instruction will copy...{}''} You can infer from this
example that AS denotes constants by dollar signs.  

The `l' in {}``movl{}'' means {}``long.{}'' The corresponding Intel
syntax,

\begin{Verbatim}[fontsize=\relsize{-2}]
      mov eax,4
\end{Verbatim}

has no such distinction, relying on the fact that EAX is a 32-bit
register to implicitly give the same message to the assembler, i.e. to
tell the assembler that we mean a 32-bit 4, not say, a 16-bit 4.

Keep in mind that {\bf movl} is just one member of the move
\underline{family} of instructions.  Later, for example, you will see
another member of that family, {\bf movb}, which does the same thing
except that it copies a byte instead of a word.

As noted earlier, we will generally use all-caps notation to refer to
instruction families, for example referring to MOV to any of the
instructions {\bf movl}, {\bf movb}, etc.

By the way, integer constants are taken to be base-10 by default.  If
you wish to state a constant in hex instead, use the C ``0x''
notation.\footnote{Note, though, that the same issues of endian-ness
will apply in using the assembler as those related to the C compiler.}

The second instruction is similar, but there is something noteworthy in
the third:

\begin{Verbatim}[fontsize=\relsize{-2}]
      movl $x, %ecx
\end{Verbatim}

In the token \$4 in the first instruction, the dollar sign meant a constant,
and the same is true for \$x. The constant here is the address of {\bf
x}. Thus the
instruction places the address of {\bf x} in the ECX register, so that EAX serves
as a pointer. A later instruction,

\begin{Verbatim}[fontsize=\relsize{-2}]
      addl $4, %ecx
\end{Verbatim}

increments that pointer by 4 bytes, i.e. 1 word, each time we go around the
loop, so that we eventually have the sum of all the words.

Note that \$x has a completely different meaning that {\bf x} by itself. The
instruction

\begin{Verbatim}[fontsize=\relsize{-2}]
      movl x, %ecx
\end{Verbatim}

would copy the contents of the memory location {\bf x}, rather than its
address, to ECX.\footnote{The Intel syntax is quite different. Under
that syntax, x would mean the address of {\bf x}, and the contents of
the word {\bf x} would be denoted as {[}x{]}.  }

The next line begins the loop:

\begin{Verbatim}[fontsize=\relsize{-2}]
top:  addl (%ecx), %ebx
\end{Verbatim}

Here we have another label, {\bf top}, a name which we've chosen to remind us
that this is the top of the loop.  This instruction takes the word
pointed to by ECX and adds it to EBX. The latter is where I am keeping
the total. 

Recall that eventually we will copy the final sum to the memory location
labeled {\bf sum}.  {\bf We don't want to do so within the loop, though,
because memory access is much slower than register access (since we must
leave the CPU to go to memory), and we thus want to avoid it.  So, we
keep our sum in a register, and copy to memory only when we are
done.\footnote{This presumes that we need it in memory for some other
reason.  If not, we would not do so.}}

If we were not worried about memory access speed, we might store
directly to the variable {\bf sum}, as follows:

\begin{Verbatim}[fontsize=\relsize{-2}]
      movl $sum,%edx  # use %edx as a pointer to "sum"
      movl $0,%ebx
top:  addl (%ecx), %ebx  # old sum is still in %ebx
      movl %ebx,(%edx)
\end{Verbatim}

Note carefully that we could NOT write

\begin{Verbatim}[fontsize=\relsize{-2}]
      movl $sum,%edx  # use %edx as a pointer to "sum"
top:  addl (%ecx),(%edx) 
\end{Verbatim}

because there is no such instruction in the Intel architecture.  Intel
chips (like most CPUs) do not allow an instruction to have both its
operands in memory.\footnote{There are actually a couple of exceptions
to this on Intel chips, as will be seen in Section \ref{stringops}.}
{\bf Note that this is a constraint placed on us by the hardware, not by
the assembler.}

\checkpoint

The bottom part of the loop is:

\begin{Verbatim}[fontsize=\relsize{-2}]
      decl %eax     
      jnz top     
\end{Verbatim}

The DEC instruction, \textbf{decl} (``decrement long''), in AT\&T
syntax, subtracts 1 from EAX.  This instruction, together with the JNZ
following it, provides our first illustration of the operation of the
EFLAGS register in the CPU:

When the hardware does almost any arithmetic operation, it also records
whether the result of this instruction is 0:  It sets Zero flag, ZF, in the
EFLAGS register to 1 if the result of the operation was 0, and sets that
flag to 0 if not.\footnote{The reason why the hardware designers chose 1
and 0 as codes this way is that they want us to think of 1 as meaning
yes and 0 as meaning no.  So, if the Zero flag is 1, the interpretation
is ``Yes, the result was zero.}  Similarly, the hardware sets the Sign
flag to 1 or 0, according to whether the result of the subtraction was
negative or not.

Most arithmetic operations do affect the flags, but for instance MOV
does not.  For a given instruction, you can check by writing a short
test program, or look it up in the official Intel CPU
manual.\footnote{This is on the Intel Web site, but also available on
our class Web page at
\url{http://heather.cs.ucdavis.edu/~matloff/50/IntelManual.PDF}.
Go to Index, then look up the page number for your instruction.  Once
you reach the instruction, look under the subheading Flags Affected.}

Now, here is how we make use of the Zero flag.  The JNZ (``jump if not
zero'') instruction says, {}``If the result of the last arithmetic
operation was not 0, then jump to the instruction labeled {\bf top}.''
The circuitry for JNZ implements this by jumping if the Zero flag is 0.
(The complementary instruction, JZ, jumps if the Zero flag is 1, i.e. if
the result of the last instruction was zero.)

\checkpoint

So, the net effect is that we will go around the loop four times, until
EAX reaches 0, then exit the loop (where {}``exiting{}'' the loop merely
means going to the next instruction, rather than jumping to the line
labeled {\bf top}).

Even though our example jump here is backwards---i.e. to a
lower-addressed memory location---forward jumps are just as common.  The
reader should think about why forward jumps occur often in
``if-then-else'' situations, for example.

By the way, in computer architecture terminology, the word ``branch'' is
a synonym for ``jump.''  In many architectures, the names of the jump
instructions begin with `B' for this reason.

The EFLAGS register is 32 bits wide (numbered 31, the most significant,
to 0, the least significant), like the other registers.  Here are
some of the flag and enable bits it includes:\footnote{The meanings of
these should be intuitive, except for the Interrupt Enable bit, which
will be described in Chapter \ref{chap:io}.}

\begin{tabular}{|c|c|}
\hline
Data & Position \\ \hline
\hline 
Overflow Flag & Bit 11 \\ \hline
Interrupt Enable & Bit 9 \\ \hline
Sign Flag & Bit 7 \\ \hline
Zero Flag & Bit 6 \\ \hline
Carry Flag & Bit 0 \\ \hline
\end{tabular}

So, for example there are instructions like JC (``jump if carry''), JNC
and so on which jump if the Carry Flag is set.

Note that the label {\bf done} was my choice, not a requirement of {\bf
as}, and I didn't need a label for that line at all, since it is not
referenced elsewhere in the program.  I included it only for the purpose
of debugging, as seen later.

Just as there is a DEC instruction for decrementing, there is INC for
incrementing. 

\subsection{Source and Destination Operands}

In a two-operand instruction, the operand which changes is called the
{\bf destination} operand, and the other is called the {\bf source}
operand.  For example, in the instruction

\begin{Verbatim}[fontsize=\relsize{-2}]
movl %eax, %ebx
\end{Verbatim}

EAX is the source and EBX is the destination.

\subsection{Remember:  No Names, No Types at the Machine Level}
\label{notypes}

Keep in mind that, just as our variable names in a C/C++ source file do
not appear in the compiled machine language, in an assembly language
source file labels---in this case, {\bf x}, {\bf sum}, {\bf \_start},
{\bf top} and {\bf done}---are just temporary conveniences for us
humans.  We use them only in order to conveniently refer AS to
certain locations in memory, in both the {\bf .data} and {\bf
.text} sections.  These labels do NOT appear in the machine language
produced by AS; only numeric memory addresses will appear there.

For instance, the instruction

\begin{Verbatim}[fontsize=\relsize{-2}]
jnz top
\end{Verbatim}

mentions the label {\bf top} here in assembly language, but the actual
machine language which comes from this is 0111010111111000.  You will
find in Chapter \ref{chap:machlang} that the first 8 bits, 01110101,
code a jump-if-not-zero operation, and the second 8 bits, 11111000, code
that the jump target is 8 bytes backward.  Don't worry about those codes
for now, but the point at hand now is that neither of those bit strings
makes any mention of {\bf top}.

Again, there are no types at the machine level.  So for example there
would be no difference between writing

\begin{Verbatim}[fontsize=\relsize{-2}]
z:
   .long 0
w:
   .byte 0
\end{Verbatim}

and

\begin{Verbatim}[fontsize=\relsize{-2}]
z:
   .rept 5
   .byte 0
   .endr
\end{Verbatim}

Both of these would simply tell AS to arrange for 5 bytes of
(zeroed-out) space at that point in the data section.  Make sure to
avoid the temptation of viewing the first version as ``declaring'' an
integer variable {\bf z} and a character variable {\bf w}.  True, the
programmer may be interpreting things that way, but the two versions
would produce exactly the same {\bf .o} file.

\checkpoint

\subsection{Dynamic Memory Is Just an Illusion}

One thing to note about our sample program above is that all memory was
allocated statically, i.e. at assembly time.\footnote{Again, be careful
here.  The memory is not actually assigned to the program until the
program is actually run.  Our word {\it allocated} here refers to the
fact that the asssembler had already planned the memory usage.}  So, we
have statically allocated five words in the {\bf .data} section, and a
certain number of bytes in the {\bf .text} section (the number of which
you'll see in Chapter \ref{chap:machlang} language).

Since C/C++ programs are compiled into machine language, and since
assembly language is really just machine language, you might wonder how
it can be that memory can be dynamically allocated in C/C++ yet not in
assembly language.  The answer to this question is that one cannot truly
do dynamic allocation in C/C++ either.  Here is what occurs:

Suppose you call {\bf malloc()} in C or invoke {\bf new} in C++.  The
latter, at least for G++, in turn calls {\bf malloc()}, so let's focus
on that function.  When you run a program whose source code is in C/C++,
some memory is set up called the {\bf heap}.  Any call to {\bf malloc()}
returns a pointer to some memory in the heap; {\bf malloc()}'s internal
data structures then record that that portion of memory is in use.
Later, if the program calls {\bf free()}, those data structures now mark
the area as available for use.

In other words, the heap memory itself was in fact allocated when the
program was loaded, i.e. statically.  However, {\bf malloc()} and its
sister functions such as {\bf free()} manage that memory, recording what
is available for use now and what isn't.  So the notion of dynamic
memory allocation is actually an illusion.  (You can, however
run the Linux commands {\bf limit} (C shell) and {\bf ulimit} (Bash
shell) to change the initial amount of heap space.)

An assembly language programmer could link the C library into his/her
program, and thus be able to call {\bf malloc()} if that were useful.
But again, it would not actually be dynamic allocation, just like it is
not in the C/C++ case.


\section{Use of Registers Versus Memory}

Recall that registers are located inside the CPU, whereas memory is
outside it.  Thus, register access is much faster than memory access.

Accordingly, when you do assembly language programming, you should try
to minimize your usage of memory, i.e. of items in the {\bf .data} section
(and later, of items on the stack), especially if your goal is program
speed, which is a common reason for resorting to assembly language.

However, most CPU architectures have only a few registers.  Thus in some
cases you may run out of registers, and need to store at least some
items in memory.\footnote{Recall that a list of usable registers for
Intel was given in Section \ref{intelarch}.}

\section{Another Example}
\label{anotherexample}

The following example does a type of sort.  See the comments for an
outline of the algorithm.  (This program is merely intended as an example
for learning assembly language, not for algorithmic efficiency.)

One of the main new ideas here is that of a {\bf subroutine}, which is
similar to the concept of a function in C/C++.\footnote{In fact, the
compiler translates C/C++ functions to subroutine at the
machine-language level.}  In this program, we have subroutines {\bf
init()}, {\bf findmin()} and {\bf swap()}. 

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
# sample program; does a (not very efficient) sort of the array x, using
# the algorithm (expressed in pseudo-C code notation):

# for each element x[i]
#    find the smallest element x[j] among x[i+1], x[i+2], ...
#    if new min found, swap x[i] and x[j]

.equ xlength, 7  # number of elements to be sorted

.data  
x:
      .long   1
      .long   5
      .long   2
      .long   18
      .long   25
      .long   22
      .long   4

.text  
      # register usage in "main()":
      #    EAX points to next place in sorted array to be determined, 
      #       i.e. "x[i]"
      #    ECX is "x[i]"
      #    EBX is our loop counter (number of remaining iterations)
      #    ESI points to the smallest element found via findmin
      #    EDI contains the value of that element
.globl _start
_start:
      call init  # initialize needed registers
top:  
      movl (%eax), %ecx
      call findmin  
      # need to swap?
      cmpl %ecx, %edi
      jge nexti
      call swap
nexti:
      decl %ebx
      jz done
      addl $4, %eax
      jmp top

done: movl %eax, %eax  # dummy, just for running in debugger  

init:
      # initialize EAX to point to "x[0]"
      movl $x, %eax  
      # we will have xlength-1 iterations
      movl $xlength, %ebx  
      decl %ebx
      ret

findmin:
      # does the operation described in our pseudocode above:
      #    find the smallest element x[j], j = i+1, i+2, ... 
      # register usage:
      #    EDX points to the current element to be compared, i.e. "x[j]" 
      #    EBP serves as our loop counter (number of remaining iterations)
      #    EDI contains the smallest value found so far
      #    ESI contains the address of the smallest value found so far
      # haven't started yet, so set min found so far to "infinity" (taken
      #    here to be 999999; for simplicity, assume all elements will be 
      #    <= 999999)
      movl $999999, %edi   
      # start EDX at "x[i+1]"
      movl %eax, %edx
      addl $4, %edx
      # initialize our loop counter (nice coincidence:  number of
      #    iterations here = number of iterations remaining in "main()")
      movl %ebx, %ebp
      # start of loop
findminloop:
      # is this "x[j]" smaller than the smallest we've seen so far?
      cmpl (%edx), %edi  # compute destination - source, set EFLAGS
      js nextj
      # we've found a new minimum, so update EDI and ESI
      movl (%edx), %edi
      movl %edx, %esi
nextj:  # do next value of "j" in the loop in the pseudocode
      # if done with loop, leave it
      decl %ebp
      jz donefindmin
      # point EDX to the new "x[j]"
      addl $4, %edx
      jmp findminloop
donefindmin:
      ret

swap:
      # copy "x[j]" to "x[i]"
      movl %edi, (%eax)
      # copy "x[i]" to "x[j]"
      movl %ecx, (%esi)
      ret
\end{Verbatim}  

Note that there are several new instructions used here, as well as a new
pseudoinstruction, {\bf .equ}. 

Let's deal with the latter first:

\begin{Verbatim}[fontsize=\relsize{-2}]
.equ xlength, 7
\end{Verbatim}

This tells the assembler that, in every line in which it sees {\bf
xlength}, the assembler should assemble that line as if we had typed 7
there instead of {\bf xlength}.  In other words {\bf .equ} works like
{\bf \#define} in C/C++.  Note carefully that {\bf xlength} is definitely not
the same as a label in the {\bf .data} section, which is the name we've given
to some memory location; in other words, {\bf xlength} is not the name
of a memory location.

At the beginning of the {\bf .text} section, we see the instruction

\begin{Verbatim}[fontsize=\relsize{-2}]
call init
\end{Verbatim}

The CALL instruction is a type of jump, with the extra feature that it
records the place the jump was made from.  That record is made on the
{\bf stack}, which we will discuss in detail in Chapter \ref{chap:sub}.
Here we will jump to the instruction labeled {\bf init} further down in
the source file:

\begin{Verbatim}[fontsize=\relsize{-2}]
init:
      movl $x, %eax  
      movl $xlength, %ebx  
      decl %ebx
      ret
\end{Verbatim}

After the CALL instruction brings us to {\bf init}, the instructions
there, i.e.

\begin{Verbatim}[fontsize=\relsize{-2}]
      movl $x, %eax  
      ...
\end{Verbatim}

will be executed, just as we've seen before.  But the RET (``return'')
instruction {\bf ret} is another kind of jump; it jumps back to the
instruction immediately following the place at which the call to {\bf
init} was made.  Recall that that place had been recorded at the time of
the call, so the machine does know it, by looking at the stack.  Keep in
mind that execution of a RET will result in a return to the instruction
{\it following} the CALL.  That instruction is the one labeled {\bf
top}:

\begin{Verbatim}[fontsize=\relsize{-2}]
top:  
      mov (%eax), %ecx
\end{Verbatim}

Again, you will find out how all this works later, in Chapter
\ref{chap:sub}.  But it is being introduced here, to encourage you to
start using subroutines from the beginning of your learning of assembly
language programming.  It facilitates a top-down approach.  (See Section
\ref{topdown}.)

As always, remember that the CPU is just a ``dumb machine.''  Suppose
for example that we accidentally put a RET instruction somewhere in our
code where we don't intend to have one.  If execution reaches that line,
the RET will indeed be executed, no questions asked.  The CPU has no way
of knowing that the RET shouldn't be there.  It does not know that we
are actually not in the midst of a subroutine.  

Also, though you might think that the CPU will balk when it tries to
execute the RET but finds no record of call location on the stack, 
the fact is that there is always {\it something} on the stack even if it
is garbage.  So, the CPU will return to a garbage point.  This is likely
to cause various problems, for instance possibly a seg fault, but the
point is that the CPU will definitely {\bf not} say, ``Whoa, I refuse to
do this RET.''

For that matter, even the assembler would not balk at our accidentally
putting a RET instruction at some random place in our code.  Remember,
the assembler is just a clerk, so for example it does not care that that
RET was not paired with a CALL instruction.

Note by the way that the code beginning at {\bf \_\_start} is analogous
to {\bf main()} in C/C++, and the comments in the code use this
metaphor.

The {\bf init()} subroutine does what it says, i.e. initialize the various
registers to the desired values.

The next instruction is another subroutine call, to {\bf findmin()}.  As
described in the pseudocode, it finds the smallest element in the
remaining portion of the array.  Let's not go into the details of how it
does this yet---not only should one {\it write} code in a top-down
manner, but one should also {\it read} code that way.  We then check to
see whether a swap should be done, and if so, we do it.

Another new instruction is CMP (``compare''), {\bf cmpl}, which is used
within {\bf findmin()}.  As noted in the comment in the code, this
instruction subtracts the source from the destination; the result, i.e.
the difference, is not stored anywhere, but the key point is that the
EFLAGS register is affected.

Since there is an instruction for addition, it shouldn't be a surprise
to know there is one for subtraction too.  For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
subl %ebx, %eax
\end{Verbatim}

subtracts c(EBX) from c(EAX), and places the difference back into EAX.
This instruction does the same thing as CMP, except that the latter
does not store the difference back anyway; the computation for the
{\bf cmpl} is done only for the purpose of setting the
flags.

Following the CMP instruction is JS (``jump if the Sign flag is 1), meaning
that we jump if the result of the last arithmetic computation was
``signed,'' i.e. negative.  (The complementary instruction, JNS, jumps
if the result of the last computation was not negative.)

In other words, the combined effect of the CMP and JS instructions here
is that we jump to {\bf nextj} if the contents of EDI is less than that
of the memory word pointed to by EDX.  In this case, we have not found a
new minimum, so we just go to the next iteration of the loop.

The instruction

\begin{Verbatim}[fontsize=\relsize{-2}]
jmp top
\end{Verbatim}

is conceptually new.  All the jump instructions we've seen so far have
been {\bf conditional}, i.e. the jump is made only if a certain
condition holds.  But JMP means to jump unconditionally.

Note my comments on the usage I intend for the various registers, e.g.
in {\bf ``main()''}:

\begin{Verbatim}[fontsize=\relsize{-2}]
      # register usage in "main()":
      #    EAX points to next place in sorted array to be determined,
      #       i.e. "x[i]"
      #    ECX is "x[i]"
      #    EBX is our loop counter (number of remaining iterations)
      #    ESI points to the smallest element found via findmin
      #    EDI contains the value of that element
\end{Verbatim}

I put these in {\bf BEFORE} I started writing the program, to help keep
myself organized, and I found myself repeatedly referring to them during
the writing process.  {\bf YOU SHOULD DO THE SAME}.

Also note again that we absolutely needed to avoid using ESP as
storage here, due to the fact that we are using {\bf call} and {\bf
ret}.  The reason for this will be explained in Chapter \ref{chap:sub}.

\section{Addressing Modes}

Consider the examples

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $9, %ebx
movl $9, (%ebx)
movl $9, x
\end{Verbatim}

As you can see, the circuitry which implements {\bf movl} allows for
several different versions of the instruction.  In the first example
above, we copy 9 to a register.  In the second and third examples, we
copy to places in memory, but even then, we do it via different ways of
specificying the memory location---using a register as a pointer to
memory in the second example, versus directly stating the given memory
location in the third example. 

The manner in which an instruction specifies an operand is called the
{\bf addressing mode} for that operand.  So, above we see three distinct
addressing modes for the second operand of the instruction.

For example, let's look at the instruction

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $9, %ebx
\end{Verbatim}

Let's look at the destination operand first.  Since the operand is in a
register, we say that this operand is expressed in {\bf register mode}.
We say that the source operand is accessed in {\bf immediate} mode,
which means that the operand, in this case the number 9, is right there
(i.e. ``immediately within'') in the instruction itself.\footnote{You'll
see this more explicitly when we discuss machine language.}

Now consider the instruction 

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $9, (%ebx)
\end{Verbatim}

Here the destination operand is the memory word pointed to by EBX.  This
is called {\bf indirect mode}; we ``indirectly'' state the location, by
saying, ``It's whatever EBX points to.''

By contrast, the destination operand in

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $9, x
\end{Verbatim}

is accessed in {\bf direct mode}; we have directly stated where in
memory the operand is.

Other addressing modes will be discussed in Section \ref{moreaddr}.

\section{Assembling and Linking into an Executable File}

\subsection{Assembler Command-Line Syntax}

To assemble an AT\&T-syntax source file, say {\bf x.s},
we will type

\begin{Verbatim}[fontsize=\relsize{-2}]
as -a --gstabs -o x.o x.s
\end{Verbatim}

The {\bf -o} option specifies what to call the object file, the
machine-language file produced by the assembler.  Here we are telling
the assembler, ``Please call our object file x.o.''\footnote{{\bf
Important note on disaster avoidance:}  Suppose we are assembling more
than one file, say

as -a --gstabs -o z.o x.s y.s

Suppose we inadvertently forgot to include the z.o on the command line
here.  Then we would be telling the assembler, ``Please call our object
file {\bf x.s}.''  This would result in the assembler overwriting {\bf
x.s} with the object file, trashing {\bf x.s}!  Be careful to avoid
this.}

The {\bf -a} option tells the assembler to display to the screen the
source (i.e. assembly) code, machine code and section
offsets\footnote{These are essentially addresses, as will be explained
later.} side-by-side, for easier viewing by us humans.

The {\bf --gstabs} option (note that there are two hyphens, not one)
tells the assembler to retain in {\bf x.o} the \textbf{symbol table}, a
list of the locations of whatever labels are in {\bf x.s}, in a form
usable by symbolic debuggers, in our case GDB or DDD.\footnote{The {\bf
--gstabs} option is similar to {\bf -g} when running GCC.}

Things are similar under other operating systems.  The Microsoft
assembler, MASM, has similar command-line options, though of course with
different names and some difference in functionality. \footnote{By the
way, NASM is available for both Unix and MS Windows.  For that matter,
even AS can be used under Windows, since it is part of the GCC package
that is available for Windows under the name Cygwin.}

\subsection{Linking}

Say we have an assembly language source file {\bf x.s}, and then
assemble it to produce an object file {\bf x.o}.  We would then type,
say,

\begin{Verbatim}[fontsize=\relsize{-2}]
ld -o x x.o
\end{Verbatim} 

Here {\bf ld} is the linker, LD, which would take our object file {\bf
x.o} and produce an executable file {\bf x}. 

We will discuss more on linking in Section \ref{linking}.

\subsection{Makefiles}

By the way, you can automate the assembly and linkage process using
Makefiles, just as you would for C/C++.  Keep in mind that makefiles
have nothing to do with C or C++.  They simply state which files depend
on which other files, and how to generate the former files from the
latter files.

So for example the form

\begin{Verbatim}[fontsize=\relsize{-2}]
x: y
<TAB> z
\end{Verbatim}

simply says, ``The file {\bf x} depends on {\bf y}.  If we need to make {\bf
x} (or re-make it if we have changed {\bf y}), then we do {\bf z}.''

So, for our source file {\bf x.s} above, our Makefile might look like this:

\begin{Verbatim}[fontsize=\relsize{-2}]
x: x.o
<TAB> ld -o x x.o

x.o: x.s
<TAB> as --gstabs -o x.o x.s
\end{Verbatim}

\section{How to Execute Those Sample Programs}

\subsection{{}``Normal{}'' Execution Won't Work}

Suppose in our sum-up-4-words example above we name the source file {\bf
Total.s}, and then assemble and link it, with the final executable file
named, say, \textbf{tot}.  We could not simply type

\begin{Verbatim}[fontsize=\relsize{-2}]
% tot
\end{Verbatim}

at the Unix command line. The program would crash with a segmentation
fault. Why is this?

The basic problem is that after the last instruction of the program is
executed, the processor will attempt to execute the {}``instruction{}''
at the next location of memory. There is no such instruction, but the
CPU won't know that.  All the CPU knows is to keep executing
instructions, one after the other.  So, when your program marches right
past its last real instruction, the CPU will try to execute the garbage
there.  I call this ``going off the end of the earth.''

Actually, in Linux the linker will arrange for our {\bf .data} section
to follow our {\bf .text} section in memory, almost immediately after
the end of the latter.  It's ``almost'' because we would like the {\bf
.data} section to begin on a word boundary, i.e. at an address which is
a multiple of 4.  The area in between is padding, consisting of 0s, in
this case three bytes of 0s.\footnote{This can be determined using
information presented in Chapter \ref{chap:machlang}.}  Thus the
``garbage'' which we execute when we ``go off the end of the earth''
is our own data!\footnote{Possibly preceded by 1-3 0 bytes.}

\checkpoint

This doesn't happen with your compiled C/C++ program, because the
compiler inserts a \textbf{system call}, i.e. a call to a function in
the operating system, which in this case is the {\bf exit()} call.
(Actually, good programming practice would be to insert this call in
your C/C++ programs yourself.)  This results in a graceful transition
from your program to the OS, after which the OS prints out your familiar
command-line prompt. 

We could have inserted system calls in our sample assembly language
programs above too, but did not done so because that is a topic to be
covered later in the course.  Note that that also means we cannot do
input and output, which is done via system calls too---so, not only does
our program crash if we run it in the straightforward manner above, but
also we have no way of knowing whether it ran correctly before crashing!

So, in our initial learning environment here, we will execute our
programs via a debugger, either DDD or GDB, which will allow us to have
the program stop when it is done and to see the results.


\subsection{Running Our Assembly Programs Using GDB/DDD}

Since a debugger allows us to set breakpoints or single-step through
programs, we won't {}``go off the end of the earth{}'' and cause a seg
fault as we would by running our programs directly.\footnote{Keep in
mind, that if we don't set a breakpoint when we run a program within the
debugger, we will still ``go off the end of the earth.''} 

Moreover, since the debuggers allow us to inspect registers and memory
contents, we can check the ``output'' of our program.  In our first
array-summing program in Section \ref{sample1}, for example, the sum was
in EBX, so the debugger would enable us to check the program's operation
by checking whether EBX contained the correct sum.

\subsubsection{Using DDD for Executing Our Assembly Programs}

{\bf Starting the Debugger:} 

For our array-summing example, we would start by assembling and linking
the source code, and then typing

\begin{Verbatim}[fontsize=\relsize{-2}]
% ddd tot
\end{Verbatim}

Your source file {\bf Total.s} should appear in the DDD Source Window.  

{\bf Making Sure You Don't ``Go Off the End of the World'':}

You would set a breakpoint at the line labeled {\bf done} by clicking on
that line and then on the red stop sign icon at the top of the window.
This arranges for your program to stop when it is done.

{\bf Running the Program:}

You would then run the program by clicking on Run.  The program would
stop at {\bf done}. 

{\bf Checking the ``Output'' of the Program:}

You may have written your program so that its ``output'' is in one or
more registers.  If so, you can inspect the register contents when you
reach {\bf done}.  For example, recall that in the {\bf tot} program,
the final sum (26) will be stored in the EBX register, so you would
inspect the contents of this register in order to check whether the
program ran correctly.

To do this, click on Status then Registers. 

On the other hand, our program's output may be in memory, as is the case
for instance for the program in Section \ref{anotherexample}.  Here we
check output by inspecting memory, as follows:

Hit Data, then Memory.  An Examine Memory window will pop up, asking you
to state which data items you wish to inspect:

\begin{itemize}

\item Fill in the blank on the left with the number of items you want to
inspect.

\item In the second field, labled ``octal'' by default, state how you
want the contents of the items described---in decimal, hex or whatever.

\item In the third field, state the size of each item---byte, word or
whatever.

\item In the last field, give the address of the first item.

\end{itemize}

For the purpose of merely checking a program's output we would choose
Print here rather than Display.  The former shows the items just once,
while the latter does so continuously; the latter is useful for actual
debugging of the program, while the former is all we need for merely
checking the program's output.

For instance, again consider the example in Section
\ref{anotherexample}.  We wish to inspect all seven array elements, so
we fill in the Examine Memory window to indicate that we wish to Print 7
Decimal Words starting at {\bf \&x}.

{\bf Note on endian-ness, etc.:}  If you ask the debugger to show a
word-sized item (i.e. a memory word or a full register), it will be
shown most-significant byte first.  Within a byte, the most-significant
bit will be shown first.

\subsubsection{Using GDB for Executing Our Assembly Programs}

In some cases, you might find it more convenient to use GDB directly,
rather than via the DDD interface.  For example, you might be using {\bf
telnet}.

(Note:  It is assumed here that you have already read the material on
using DDD above.)

{\bf Starting the Debugger:} 

For the {\bf tot} program example here, you would start by assembling
and linking, and then typing

\begin{Verbatim}[fontsize=\relsize{-2}]
gdb tot
\end{Verbatim}

{\bf Making Sure You Don't ``Go Off the End of the World'':}

Set the breakpoint at {\bf done}:

\begin{Verbatim}[fontsize=\relsize{-2}]
(gdb) b done
Breakpoint 1 at 0x804808b: file sum.s, line 28.
\end{Verbatim}

{\bf Running the Program:}

Issue the {\bf r} (``run'') command to GDB.

{\bf Checking the ``Output'' of the Program:}

To check the value of a register, e.g. EBX, use the {\bf info registers}
command:

\begin{Verbatim}[fontsize=\relsize{-2}]
(gdb) info registers ebx
ebx            0x1a     26
\end{Verbatim}

To check the value of a memory location, use the {\bf x} (``examine'')
command.  In the example in Section \ref{anotherexample}, for instance:

\begin{Verbatim}[fontsize=\relsize{-2}]
x/7w &x
\end{Verbatim}

This says to inspect 7 words, beginning at {\bf x}, printing out the
contents in hex.  In some cases, e.g. if you are working with code
translated from C to assembly language, GDB will switch to printing
individual bytes, in which case use, e.g., {\bf x/7x} instead of {\bf
x/7w}.

There are other ways to print out the contents, e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
x/12c &x
\end{Verbatim}

would treat the 12 bytes starting at {\bf x} as characters and print
them out.

\section{How to Debug Assembly Language Programs}

\subsection{Use a Debugging Tool for ALL of Your Programming, in
EVERY Class}

{\fbox {\parbox{6.5in}{ I've found that many students are {\bf shooting
themselves in the foot} by not making use of debugging tools. They learn
such a tool in their beginning programming class, but treat it as
something that was only to be learned for the final exam, rather than
for their own benefit. Subsequently they debug their programs with calls
to printf() or cout, which is really a slow, painful way to debug.
\textbf{You should make use of a debugging tool in all of your
programming work -- for} \textbf{\underbar{your}} \textbf{benefit, not
your professors'.} (See my debugging-tutorial slide show, at
\url{http://heather.cs.ucdavis.edu/~matloff/debug.html}.) }}}

For C/C++ programming on Unix machines, many debugging tools exist, some
of them commercial products, but the most commonly-used one is GDB.
Actually, many people use GDB only indirectly, using DDD as their
interface to GDB; DDD provides a very nice GUI to GDB.

\subsection{General Principles}

\subsubsection{The Principle of Confirmation}

Remember (as emphasized in the debugging slide show cited above) {\bf
the central principle of debugging is \underline{confirmation}.}  We
need to step through our program, at each step \underline{confirming}
that the various registers and memory locations contain what we think
they ought to contain, and \underline{confirming} that jumps occur when
we think they ought to occur, and so on.  Eventually we will reach a
place where something fails to be confirmed, and then that will give us
a big hint as to where our bug is.

\subsubsection{Don't Just \underline{Write} Top-Down, But
\underline{Debug} That Way Too}

Consider code like, say,

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $12, %eax
call xyz
addl %ebx, %ecx
\end{Verbatim}

When you are single-stepping through your program with the debugging
tool and reach the line with the {\bf call} instruction, the tool will
give you a choice of going in to the subroutine (e.g. the Step command
in DDD) or skipping over it (the Next command in DDD).  Choose the
latter at first.  When you get to the next line (with {\bf addl}), you
can check whether the ``output'' of the subroutine {\bf xyz()} is
correct; if so, you will have saved a lot of time and distraction by
skipping the detailed line-by-line execution of {\bf xyz()}.  If on the
other hand you find that the output of the subroutine is wrong, then you
have narrowed down the location of your bug; you can then re-run the
program, but in this case opt for Step instead of Next when you get to
the call.

\subsection{Assembly Language-Specific Tips}

\subsubsection{Know Where Your Data Is}

The first thing you should do during a debugging session is write down
the addresses of the labeled items in your {\bf .data} section.  And
then \underline{use} those addresses to help you debug.

Consider the program in Section \ref{anotherexample}, which included a
data label {\bf x}.  We should first determine where {\bf x} is.  We can
do this in GDB as follows:

\begin{Verbatim}[fontsize=\relsize{-2}]
(gdb) p/x &x
\end{Verbatim}

In DDD, we might as well do the same thing, issuing a command directly
to GDB via DDD's Console window.

Knowing these addresses is extremely important to the debugging process.
Again using the program in Section \ref{anotherexample} for
illustration, on the line labeled {\bf top} the EAX register is serving
as a pointer to our current position in {\bf x}.  In order to verify
that it is doing so, we need to know the address of {\bf x}.

\subsubsection{Seg Faults}

Many bugs in assembly language programs lead to seg faults.  These
typically occur because the programmer has inadvertently used the wrong
addressing mode or the wrong register.

For example, consider the instruction 

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $39, %edx
\end{Verbatim}

which copies the number 39 to the EDX register.

Suppose we accidentally write

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $39, (%edx)
\end{Verbatim}

This copies 39 to the memory location pointed to by EDX.  If EDX
contains, say, 12, then the CPU will attempt to copy 39 to memory
location 12, which probably will not be in the memory space allocated to
this program when the OS loaded it into memory.  In other words, we
used the wrong addressing mode, which will cause a seg fault.

When we run the program in the debugger, the latter will tell us exactly
where---i.e. at what instruction---the seg fault occurred.  This is of
enormous value, as it pinpoints exactly where our bug is.  Of course,
the debugger won't tell us, ``You dummy, you used the wrong addressing
mode''---the programmer then must determine \underline{why} that
particular instruction caused the seg fault---but at least the debugger
tells us where the fault occurred.

Suppose on the other hand, we really did need to use indirect addressing
mode, but accidentally specified ECX insteade of EDX.  In other words,
we intended to write

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $39, (%edx)
\end{Verbatim}

but accidentally wrote

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $39, (%ecx)
\end{Verbatim}

Say c(EDX) = 0x10402226 but c(ECX) = 12,\footnote{Recall that c() means
``contents of.''} and that 0x10402226 is in a part of memory which was
allocated to our program but 12 is not.  Again we will get a seg fault,
as above.  Again, the debugger won't tell us, ``You dummy, you should
have written ``\%edx'', not ``\%ecx'', but at least it will pinpoint
which instruction caused the seg fault, and we then can think about what
we might have done wrong in that particular instruction.

\subsection{Use of DDD for Debugging Assembly Programs}

To examine register contents, hit Status and then Registers.   All
register contents will be displayed.  

The display includes EFLAGS, the flags register.  The Carry Flag is bit
0, i.e. the least-significant bit.  The other bits we've studied are the
Zero Flag, bit 6, the Sign Flag, bit 7, and the Overflow Flag, bit 11.

To display, say, an array {\bf z}, hit Data, then Memory.  State how many
cells you want displayed, what kind of cells (whole words, individual
bytes, etc.), and where to start, such as {\bf \&z}.   It will also ask
whether you want the information ``printed,'' which means displayed just
once, or ``displayed,'' which means a continuing display which reflects
the changes as the program progresses.

Breakpoints may be set and cleared using the stop sign
icons.\footnote{For some reason, it will not work if we set a breakpoint
at the very first instruction of a program, though any other instruction
works.} 

You can step through your code line by line in the usual debugging-tool
manner.  However, in assembly language, make sure that you use Stepi
instead of Next or Step, since we are working at the machine instruction
level (the `i' stands for ``instruction.'')  

Note that when you modify your assembly-language source file and
reassemble and link, DDD will not automatically reload your source and
executable files.  To reload, click on File in DDD, then Open Program
and click on the executable file.

\subsection{Use of GDB for Debugging Assembly Programs} 

\subsubsection{Assembly-Language Commands}

Assuming you already know GDB (see the link to my Web tutorial),
here are the two new commands you should learn.

\begin{itemize}

\item To view all register contents, type

\begin{Verbatim}[fontsize=\relsize{-2}]
info registers
\end{Verbatim}

You can view specific registers with the {\bf p} (``print'') command,
e.g.

\begin{Verbatim}[fontsize=\relsize{-2}]
p/x $pc
p/x $esp
p/x $eax
\end{Verbatim}

\item To view memory, use the {\bf x} (``examine'') command.  If for
example you have a memory location labeled {\bf z} and wish to examine the
first four words starting at a data-section label {\bf z}, type

\begin{Verbatim}[fontsize=\relsize{-2}]
x/4w &z
\end{Verbatim}

Do not include the ampersand in the case of a text-section label.  Note
that the {\bf x} command differs greatly from the {\bf p} command, in
that the latter prints out the contents of only one word.

Note too that you can do indirection.  For example

\begin{Verbatim}[fontsize=\relsize{-2}]
x/4w $ebx
\end{Verbatim}

would display the four words of memory beginning at the word pointed to
by EBX.

\item As in the DDD case, use the Stepi mode of single-stepping
through code;\footnote{The Nexti mode is apparently unreliable.  Of
course, you can still hop through the code using breakpoints.} the
command is

\begin{Verbatim}[fontsize=\relsize{-2}]
(gdb) stepi
\end{Verbatim}

or just

\begin{Verbatim}[fontsize=\relsize{-2}]
(gdb) si
\end{Verbatim}

\end{itemize}

Unlike DDD, GDB automatically reloads the program's executable file when
you change the source.

An obvious drawback of GDB is the amount of typing required.  But
this can be greatly mitigated by using the ``define'' command, which
allows one to make abbreviations.  For example, we can shorten the
typing needed to print the contents of EAX as follows:

\begin{Verbatim}[fontsize=\relsize{-2}]
(gdb) define pa
Type commands for definition of "pa".
End with a line saying just "end".
>p/x $eax
>end
\end{Verbatim}

From then on, whenever we type {\bf pa} in this {\bf gdb} session,
the contents of EAX will be printed out.

Moreover, if we want these abbreviations to carry over from one session
to another for this program, we can put them in the file {\bf .gdbinit}
in the directory containing the program, e.g. by placing these lines

\begin{Verbatim}[fontsize=\relsize{-2}]
define pa
p/x $eax
end
\end{Verbatim}

in {\bf .gdbinit}, {\bf pa} will automatically be defined in each debugging
session for this program.

Use {\bf gdb}'s online help facility to get further details; just type
``help'' at the prompt.

\subsubsection{TUI Mode}

As mentioned earlier, it is much preferable to use a GUI for debugging,
and thus the DDD interface to GDB is highly recommended.  As a middle
ground, though, you may try GDB's new TUI mode.  You will need a
relatively newer version of GDB for this, and it will need to have been
built to include TUI.\footnote{If your present version of GDB does not
include TUI (i.e. GDB fails when you invoke it with the {\bf -tui}
option), you can build your own version of GDB.  Download it from
\url{www.gnu.org}, run {\bf configure} with the option {\bf --enable-tui},
etc.}

TUI may be invoked with the {\bf -tui} option on the GDB command line.  
While running GDB, you toggle TUI mode on or off using {\bf ctrl-x
a}.

If your source file is purely in assembly language, i.e. you have no
{\bf main()}, first issue GDB's {\bf l} (``list'') command, and hit
Enter an extra time or two.  That will make the source-code subwindow
appear.

Then, say, set a breakpoint and issue the {\bf r} (``run'') command
to GDB as usual.

In the subwindow, breakpoints will be marked with asterisks, and your
current instruction will be indicated by a $>$ sign.

In addition to displaying a source code subwindow, TUI will also
display a register subwindow if you type

\begin{Verbatim}[fontsize=\relsize{-2}]
(gdb) layout reg
\end{Verbatim}

This way you can watch the register values and the source code at the
same time.  TUI even highlights a register when it changes values.

Of course, since TUI just adds an interface to GDB, you can use all the
GDB commands with TUI.

\subsubsection{CGDB}

Recall that the goal of TUI in our last subsection is to get some of the
functionality of a GUI like DDD while staying within the text-only
realm.  If you are simply Telnetting into to the machine where you are
debugging a program, TUI is a big improvement over ordinary GDB.  CGDB
is another effort in this direction.

Whereas TUI is an integral part of GDB, CGDB is a separate front end to
GDB, not developed by the GDB team.  (Recall that DDD is also like this,
but as a GUI rather than being text-based.)  You can download it from
\url{http://cgdb.sourceforge.net/}.

Like TUI, CGDB will break the original GDB window into several
subwindows, one of which is for GDB prompts and the other for viewing
the debuggee's source code.  CGDB goes a bit further, by allowing easy
navigation through the source-code subwindow, and by using a nice
colorful interface.

To get into the source-code subwindow, hit Esc.  You can then move
through that subwindow using the {\bf vi}-like commands, e.g. {\bf j}
and {\bf k} to move down or up a line, {\bf /} to search for text, etc.

To set a breakpoint at the line currently highlighted by the cursor,
just hit the space bar.  Breakpoints are highlighted in
red,\footnote{When you build CGDB, make sure you do {\bf make install},
not just {\bf make}.  As of this early verion of CGDB, in March 2003,
this feature does not seem to work for assembly-language source code.}
and the current instruction in green.

Use the {\bf i} command to get to the GDB command subwindow.

CGDB's startup file is {\bf cgdbrc} in a directory named {\bf .cgdb} 
in your home directory.  One setting you should make sure to have there
is

\begin{Verbatim}[fontsize=\relsize{-2}]
set autosourcereload
\end{Verbatim}

which will have CGDB automatically update your source window when you
recompile.

\section{Some More Operand Sizes}
\label{smalloperands}

Recall the following instruction from our example above:

\begin{Verbatim}[fontsize=\relsize{-2}]
addl (%ecx), %ebx
\end{Verbatim}

How would this change if we had been storing our numbers in 16-bit
memory chunks?

In order to do that, we would probably find it
convenient\footnote{Though not mandatory; recall Section \ref{notypes}.}
to use {\bf .word} instead of {\bf .long} for initialization in the {\bf
.data} section.  Also, the above instruction would become

\begin{Verbatim}[fontsize=\relsize{-2}]
addw (%ecx), %bx
\end{Verbatim}

with the w meaning ``word,'' an allusion to the fact that earlier
Intel chips had word size as 16 bits.  

The changes here are self-explanatory, but the non-change may seem odd
at first:  Why are we still using ECX, not CX?  The answer is that even
though we are accessing a 16-bit item, its address is still 32 bits.

The corresponding items for 8-bit operations are {\bf .byte} in place of
{\bf .long}, {\bf movb} instead of {\bf movl}, {\bf \%ah} or {\bf \%al}
(high and low bytes of AX) in place of EAX, etc.

If you wish to have conditional jumps based on 16-bit or 8-bit
quantities, be sure to use {\bf cmpw} or {\bf cmpb}, respectively.

If the destination operand for a byte instruction is of word size, the
CPU will allow us to ``expand'' the source byte to a word.  For example,

\begin{Verbatim}[fontsize=\relsize{-2}]
movb $-1, %eax
\end{Verbatim}

will take the byte -1, i.e. 11111111, and convert it to the word -1, i.e.
11111111111111111111111111111111 which it will place in EAX.  Note that
the sign bit has been extended here, an operation known as {\bf sign
extension}.  

By the way, though, in this situation the assembler will also give you a
warning message, to make sure you really do want to have a word-size
destination operand for your byte instruction.  And the assembler will
give an error message if you write something like, say,

\begin{Verbatim}[fontsize=\relsize{-2}]
movb %eax, %ebx
\end{Verbatim}

with its source operand being word-sized.  The assembler has no choice
but to give you the error message, since the Intel architecture has no
machine instruction of this type; there is nothing the assembler can
assemble the above line to.

\section{Some More Addressing Modes}
\label{moreaddr}

Following are examples of various addressing modes, using decrement and
move for our example operations.  The first four are ones we've seen
earlier, and the rest are new.  Here {\bf x} is a label in the {\bf
.data} section.

\begin{Verbatim}[fontsize=\relsize{-2}]
decl %ebx           # register mode
decl (%ebx)         # indirect mode
decl x              # direct mode
movl $8888, %ebx    # source operand is in immediate mode

decl x(%ebx)        # indexed mode
decl 8(%ebx)        # based mode (really same as indexed)
decl (%eax,%ebx,4)  # scale-factor (my own name for it)
decl x(%eax,%ebx,4) # based scale-factor (my own name for it)
\end{Verbatim}

Let's look at the indexed mode first.

The expression {\bf x(\%ebx)} means that the operand is c(EBX) bytes
past {\bf x}.  The name ``indexed'' comes from the fact that EBX here is
playing the role of an array index.

For example, consider the C code

\begin{Verbatim}[fontsize=\relsize{-2}]
char x[100];
...
x[12] = 'g';
\end{Verbatim}

Since this is an array of type {\bf char}, i.e. with each array element
being stored in one byte, {\bf x[12]} will be at the memory location 12
bytes past {\bf x}.  So, the C compiler might translate this to

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $12, %ebx
movb $'g', x(%ebx)
\end{Verbatim}

Again, EBX is playing the role of the array index here, hence the name
``indexed addressing mode.'' We say that EBX is the {\bf index register}
here.  On Intel machines, almost any register can serve as an index
register. 

This won't work for {\bf int} variables.  Such variables occupy 4 bytes
each.  Thus our machine code would need an extra instruction in the {\bf
int} case.  The C code

\begin{Verbatim}[fontsize=\relsize{-2}]
int x[100],i;  // suppose these are global
...
x[i] = 8888;
\end{Verbatim}

would be translated to something like

\begin{Verbatim}[fontsize=\relsize{-2}]
movl i, %ebx
imull $4, %ebx
movl $8888, x(%ebx)
\end{Verbatim}

(Here {\bf imull} is a multiplication instruction, to be discussed in
detail in Chapter \ref{chap:arithlog}.)

So, Intel has another more general mode, which I have called
``scale-factor mode'' above.  Here is how it works:

In the scale-factor mode, the syntax is

\begin{Verbatim}[fontsize=\relsize{-2}]
(register1,register2,scale_factor)
\end{Verbatim}

and the operand address is register1+scale\_factor*register2.  The
syntax

\begin{Verbatim}[fontsize=\relsize{-2}]
w(register1,register2,scale_factor)
\end{Verbatim}

means w + register1+scale\_factor*register2, where w is a constant.

We can now avoid that extra {\bf imull} instruction by using
scale-factor mode:

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $x, %eax
movl i, %ebx
movl $8888, (%eax, %ebx, 4)
\end{Verbatim}

Of course, that still entailed an extra step, to set EAX.  But if we
were doing a lot of accesses to the array {\bf x}, we would need to set
EAX only once, and thus come out ahead.\footnote{Actually, the product
$4 \times i$ must still be computed, but it is now done as part of that
third {\bf movl} instruction, rather than as an extra instruction.  This
can speed things up in various ways, and it makes our code cleaner---EBX
really does contain the index, not 4 times the index.}

\checkpoint

By the way, if the instruction to be compiled had been

\begin{Verbatim}[fontsize=\relsize{-2}]
x[12] = 8888;
\end{Verbatim}

then plain old direct mode would have sufficed:

\begin{Verbatim}[fontsize=\relsize{-2}]
movl $8888, x+48
\end{Verbatim}

The point is that here the destination would be a fixed address, 48
bytes past {\bf x} (remember that the address of {\bf x} is fixed).


What about based mode?  Indexed and based modes are actually identical,
even though we think of them somewhat differently because we tend to use
them in different contexts.  In both cases, the syntax is

\begin{Verbatim}[fontsize=\relsize{-2}]
constant(register)
\end{Verbatim}

The action is then that the operand's location is constant+register
contents.  If the constant is, say, 5000 and the contents of the
register is 240, then the operand will be at location 5240.

Note that in the case of {\bf x(\%ebx)}, the {\bf x} \underline{is} a
constant, because {\bf x} here means the address of {\bf x}, which is a
constant.  Indeed, the expression {\bf (x+200)(\%ebx)} is also valid,
meaning ``EBX bytes past x+200,'' or if you prefer thinking of it this
way, ``EBX+200 bytes past x.''

We tend to \underline{think} of based mode a bit differently from
indexed mode, though.  We think of {\bf x(\%ebx)} as meaning ``EBX bytes
past x,'' while we think of {\bf 8(\%ebx)} as meaning ``8 bytes past
[the place in memory pointed to by] EBX.''  The former is common in
array contexts, as we have seen, while the latter occurs with stacks.

You will see full detail about stacks in Chapter \ref{chap:sub} later
on.  But for now, let's recall that local variables are stored on the
stack.  A given local variable may be stored, say, 8 bytes from the
beginning of the stack.  You will also learn that the ESP register
points to the beginning of the stack.  So, the local variable is indeed
``8 bytes past ESP,'' explaining why based mode is so useful.

\section{Inline Assembly Code for C++}

The C++ language includes an {\bf asm} construct, which allows you to
embed assembly language source code right there in the midst of your C++
code.\footnote{This is, as far as I know, available in most C++
compilers.  Both GCC and Microsoft's C++ compiler allow it.}

This feature is useful, for instance, to get access to some of the fast
features of the hardware.  For example, say you wanted to make use of
Intel's fast MOVS string copy instruction.  You could write an assembly
language subroutine using MOVS and then link it to your C++ program, but
that would add the overhead of subroutine call/return.  (More on this in
Chapter \ref{chap:sub}.)  Instead, you could write the MOVS code there
in your C++ source file.

Here's a very short, overly simple
example:

\begin{Verbatim}[fontsize=\relsize{-2}]
// file a.c

int x;

main()

{  scanf("%d",&x);
   __asm__("pushl x");
}
\end{Verbatim}

After doing

\begin{Verbatim}[fontsize=\relsize{-2}]
gcc -S a.c
\end{Verbatim}

the file {\bf a.s} will be

\begin{Verbatim}[fontsize=\relsize{-2}]
        .file   "a.c"
        .section        .rodata
.LC0:
        .string "%d"
        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $20, %esp
        movl    $x, 4(%esp)
        movl    $.LC0, (%esp)
        call    scanf
#APP
        pushl x
#NO_APP
        addl    $20, %esp
        popl    %ecx

\end{Verbatim}

Our assembly language is bracketed by APP and NO\_APP, and sure enough,
it is

\begin{Verbatim}[fontsize=\relsize{-2}]
pushl x
\end{Verbatim}

For an introduction to how to use this feature, see the tutorials on the
Web; just plug ``inline assembly tutorial'' into Google.  For instance,
there is one at \url{http://www.opennet.ru/base/dev/gccasm.txt.html}.

\section{Example:  Counting Lower-Case letters}

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
.data 
x:  .string "c92jemc82ne<824j8vcm92jq3.,.u"
counts:  
    .rept 26 
    .byte 0 
    .endr
.text  
.globl _start
_start:
   # EAX will always point to the current character to be tallied
   movl $x, %eax
top:  
   # need to zero out all of EBX for later use (see subl)
   movl $0, %ebx  
   # get the character to be tallied
   movb (%eax), %bl
   # check for end of string
   cmpb $0, %bl
   jz done
   # check to see if in range 'a'-'z'
   cmpb $'a', %bl
   js nextchar
   cmpb $'z'+1, %bl
   jge nextchar
   # find distance past counts where we will increment
   subl $'a',%ebx
   # add that distance to counts to get address of place to increment
   addl $counts, %ebx
   # now increment
   incb (%ebx)
   # OK, ready to go to the next character in the string
nextchar:
   addl $1, %eax
   jmp top
done: movl %edx, %edx
\end{Verbatim}

\section{``Linux Intel Assembly Language'':  Why ``Intel''?  Why
``Linux''?}

The title of this document is ``Linux Intel Assembly Language''?  Where
do those qualifiers ``Intel'' and ``Linux'' come in?

First of all, the qualifier ``Intel'' refers to the fact, discussed
earlier, that every CPU type has a different instruction set and
register set.  Machine code which runs on an Intel CPU will be rejected
on a PowerPC CPU, and vice versa.

The ``Linux'' qualifier is a little more subtle.  Suppose we have a C
source file, {\bf y.c}, and compile it twice on the same PC, once under
Linux and then under Windows, producing executable files {\bf y} and
{\bf y.exe}.  Both files will contain Intel instructions.  But for I/O
and other OS services, the calls in {\bf y} will be different from the
calls in {\bf y.exe}.

\checkpoint

\section{Viewing the Assembly Language Version of the Compiled Code}

We will have often occasion to look at the assembly language which the
compiler produces as its first step in translating C code.  In the case
of GCC we use the {\bf -S} option to do this.  For example, if
you type

\begin{Verbatim}[fontsize=\relsize{-2}]
gcc -S y.c
\end{Verbatim}

an assembly language file {\bf y.s} will be created, as it would
ordinarily, but the compiler will stop at that point, not creating
object or executable files.  You could even then apply \textbf{as} to
this file, producing {\bf y.o}, and then run \textbf{ld} on {\bf y.o} to
create the same executable file \textbf{a.out}, though you would also
have to link in the proper C library code file.\footnote{In order to do
the latter automatically, without having to know which file it is and
where it is, use GCC to do the linking:

gcc y.o

GCC will call LD for you, also supplying the proper C
library code file.}

\section{String Operations}
\label{stringops}

The STOS (STore String) family of instructions does an extremely fast
copy of a given string to a range of consecutive memory locations, much
faster than one could do with MOV instructions in a programmer-written
loop.

For example, the {\bf stosl} instruction copies the word in EAX to the
memory word pointed to by the EDI register, and increments EDI by 4.  If
we add the {\bf prefix}, {\bf rep} (``repeat''), i.e. our line of
assembly language is now

\begin{Verbatim}[fontsize=\relsize{-2}]
rep stosl
\end{Verbatim}

and then this is where the \underline{real} power comes in.  That single
instruction effectively becomes the equivalent of a loop:  The {\bf
stosl} instruction will be executed repeatedly, with ECX being
decremented by 1 each time, until ECX reaches 0.  This would mean that
c(EAX) would be copied to a series of consecutive words in memory.

Note that the {\bf rep} prefix is in effect an extra op code, prepended
before the instruction itself.  The instruction

\begin{Verbatim}[fontsize=\relsize{-2}]
stosl
\end{Verbatim}

translates to the machine code 0xab, but with {\bf rep} prepended, the
instruction is 0xabf3.\footnote{The lower-address byte will be f3, and
the higher-address byte will be ab.  Recall that the C notation ``0x''
describes a bit string by saying what the (base-16) number would be if
the string were representing an integer.  Since we are on a
little-endian machine, the 0x notation for this instruction would be
0xabf3.}

EDI, EAX and ECX are wired-in operands for this instruction.  The
programmer has no option here, and thus they do not appear in the
assembly code.  

The way to remember EDI is that the D stands for ``destination.''

There is also the MOVS family, which copies one possibly very long
string in memory to another place in memory.  EDI again plays the
destination role, i.e. points to the place to be copied to, and the
ESI register points to the source string.\footnote{Again, the S here
refers to ``source.''}

Here is an example of STOS:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
.data
x: 
    .space 20  # set up 5 words of space

.text

.globl _start

_start:
      movl $x,%edi
      movl $5,%ecx  # we will copy our string to 5 words
      movl $0x12345678,%eax  # the string to be copied is 0x12345678
      rep stosl
done:
\end{Verbatim}

Here is an example of MOVS, copying one string to another:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
.data
x: .string "abcde"  # 5 characters plus a null
y: .space 6

.text
.globl _start
_start:
    movl $x, %esi
    movl $y, %edi
    movl $6, %ecx
    rep movsb          
done:
\end{Verbatim}

Again, you must use ESI, EDI and ECX for the source address, destination
address and repeat count, respectively.  No other registers may be used.

Warning:  REP MOVS really is the equivalent (though much faster) of
writing a loop to copy the characters from the source to the
destination.  An implication of this is that if the destination overlaps
the source, you won't necessarily get a copy of the original source in
the destination.  If for instance the above code were


\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
.data
x: .string "abcde"  # 5 characters plus a null
y: .space 9

.text
.globl _start
_start:
    movl $x, %esi
    movl %esi, %edi
    addl $3, %edi
    movl $6, %ecx
    rep movsb
done:
\end{Verbatim}

then the resulting contents of the nine bytes starting at $y$ would be
('a','b','c','a','b','c','a','b','c').

\section{Useful Web Links}

\begin{itemize}

\item Linux assembly language Web page: \url{http://linuxassembly.org/}

\item full \textbf{as} manual: 
\url{http://www.gnu.org/manual/gas-2.9.1/html_mono/as.html} (contains
full list of directives, register names, etc.; op code names are same as
Intel syntax, except for suffixes, e.g. `'l' in ``movl'')

\item Intel2gas, converter from Intel syntax to AT\&T and vice versa:
\url{http://www.niksula.cs.hut.fi/~mtiihone/intel2gas/}

\item There are many Web references for the Intel architecture.  Just
plug ``Intel instruction set'' into any Web search engine.  

One such site is \url{http://www.penguin.cz/~literakl/intel/intel.html}.  

For more detailed information, see the Intel manual, at
\url{ftp://download.intel.com/design/Pentium4/manuals/24547108.pdf}.
There is an index at the end.

Also, if you need to learn about a specific instruction, often you can
get some examples by plugging the instruction name and the word {\it
example} into Google or any other Web search engine.  Use the family
name for the instruction, e.g. MOV instead of {\bf movl}, {\bf movb}
etc.  This will ensure that your search will pick up both
Intel-syntax-oriented and AT\&T-syntax-oriented sites.

By the way, if you do get information on the Web which is
Intel-syntax-oriented, use the {\bf intel2gas} program, mentioned above,
to convert it to AT\&T.

\item NASM assembler home page: \url{http://nasm.2y.net/}  I don't use
it myself, but it is useful in that it is usable on both Linux and
Windows

\item the ALD debugger: \url{http://ellipse.mcs.drexel.edu/ald.html}

\item my tutorials on debugging, featuring my slide show, using \textbf{ddd}: \url{http://heather.cs.ucdavis.edu/~matloff/debug.html}

\item Unix tutorial: \url{http://heather.cs.ucdavis.edu/~matloff/unix.html}

\item Linux installation guide: \url{http://heather.cs.ucdavis.edu/~matloff/linux.html}

\end{itemize}

\section{Top-Down Programming}
\label{topdown}

Programming is really mind-boggling work.  When one starts a large,
complex program, it is really a daunting feeling.  One may feel, ``Gee,
there is so much to do...''  It is imperative that you deal with this by
using the {\bf top-down} approach to programming, also known as {\bf
stepwise refinement}.  Here is what it is.

You start by writing {\bf main()} (or in assembly language, {\bf
\_start()}), and---this is key---making sure that it consists of no more
than about a dozen lines of code.\footnote{I'm just using 12 lines as an
example.  You may prefer a smaller limit, especially when working at the
assembly language level.}  Since most programs are much bigger than just
12 lines, this of course means that some, many most, of these 12 lines
will be calls to functions (or in the assembly language case, calls to
subroutines).  It is important that you give the functions good names,
in order to remind yourself what tasks the various functions will
perform.

In a large, complex program, some of those functions will also have a
lot to do, so you should impose upon yourself a 12-line on them too!  In
other words, some functions will themselves consist of calls to even
more functions.

The point is that this is something easy and somewhat quick to do,
something that you can do without feeling overwhelmed.  Each 12-line
module which you write is simple enough so that you can really focus
your thoughts.


