Supplementary Notes

"Linux Assembly Language Programming," by Bob Neveln, Prentice-Hall, 2000

Notes by N. Matloff, University of California, Davis

p.52:

A register is a memory cell inside the CPU.

p.53:

The 8088 was a 16-bit processor, in that its internal operations are of that size. However, it was set up to be attached to only an 8-bit data bus, and thus needed to make two trips to memory to access one word.

The 8086 was identical to the 8088, but with a 16-bit bus interface.

IBM, in designing the first IBM PC, chose the 8088 because it was a few dollars cheaper than the 8086.

p.54:

The correct term for MOV etc. is instruction, not command.

p.59:

Note that not every register can be used with the MUL instruction. The product (in the 8-bit operand case) must go into AX. We say that the Intel instruction set is highly nonorthogonal, meaning that many operation/operand combinations are not allowed.

p.66:

Many CPUs have no IN or OUT instructions, and thus rely exclusively on memory-mapped I/O, using MOV. The designer of a full computer using such a CPU must be careful not to assign any duplicate addresses to memory and I/O devices.

p.70:

Note that when the CPU is running in 32-bit mode, as in our usage, the flags will reflect 32-bit operations. Thus one should not base conditional jumps on operations done on 16-bit quantities, e.g. SUB AX,BX.

p.83:

Recall that the CPU will contain a register known as the Program Counter (PC) or some similar name. The PC will point to the place in memory at which the CPU is supposed to fetch the instruction.

Just before executing JNS NXT, the PC will contain 0x13, the location of that instruction. It will fetch the first byte, 0xf, discover that this is one of JZ/JNZ/JS/JNS, and thus fetch the second byte to see which of these instructions we have in this case. The second byte, 0x89, shows that this is a JNS. That in turn tells us that we need to fetch four more bytes for the operand, which turns out to be 0x04000000.

The CPU will then increment the PC to 0x19. It does this in order to prepare to execute the next instruction, MOV EAX,ECX when the currently-executing one is finished (which it may or may not do, but it doesn't know which yet). For this reason, the offset operand in JNS was calculated (by the assembler) to be relative to 0x24. The jump target is NXT, at 0x1d, so the offset is calculated to be 0x001d-0x0019=0x0004.

p.84:

The Intel instruction set also includes jump instructions which use indirect addressing. For example,

JMP EAX

means to jump to the place in memory pointed to by EAX. And

JMP [EAX]

means double indirection, i.e. jump to the place in memory pointed to by the place in memory pointed to by EAX! (Think of a double pointer in C, say **p.)

By the way, the syntax here is odd--why didn't Intel put brackets around EAX in the first case, and maybe double brackets in the second case? Maybe the answer lies in the fact that the assembler's parsing would be more difficult with double brackets.

p.98:

Note that the ADC instruction could be synthesized from other instructions. For instance, Program 6.2 would become

     MOV EAX,[108H]
     ADD [100H],EAX
     MOV EAX,[10CH]
     JNC T
     ADD [104H],1
 T:  ADD [104H],EAX

Here JNC is the "jump if the Carry Flag is not set" instruction, analogous to JNZ and JNS.

On an Intel machine, it will be better to use ADC, because it will result in code which is faster (e.g. due to needing fewer instruction fetches from memory) and which takes up less space in memory.

However, so-called RISC ("reduced instruction set") machines would probably not have an ADC instruction. Proponents of RISC argue that the circuitry will be faster if the instruction set is more compact (and more orthogonal).

p.99:

This code is much less confusing if you keep in mind that (a) they are multiplying by 3 by adding 3 times, (b) x is used just as "scratch paper," i.e. a place to store intermediate results, and (c) y is the place where we are storing our powers of 3.

p.100:

Neveln's point is that although we would ordinarily write ADD ESI,4, we can't do that because it may change the Carry Flag. True, but it would be better not to use ADC, replacing it with JC or JNC (see notes for p.98 above).

p.101:

Complex addressing modes like this used to be common in CISC machines (and even in RISC machines today). The PDP-11, for instance, had an "autoincrement" mode, such as in the instruction

ADD (R2+),R4

The registers on that machine were named R0, R1, etc. The parentheses meant indirect addressing mode, and the `+' meant that R2 should be incremented after the source operand is fetched. In other words, the instruction adds the memory word pointed to by R2 to R4, and then increments R2 to point to the next word. This was useful in loops which would access all the words in an array.

The PDP-11 was the machine on which UNIX and C were developed. Thus the C construct

x = x + y[i++];

was inspired by this machine.

p.105:

In MOV [ABC],AL, note that the brackets mean the operand uses a pointer, i.e. the instruction copies AL to the place in memory pointed to by ABC. Compare to MOV ABC,AL, which would copy AL to ABC.

The function execve() is a system call used to start the execution of another program. Suppose for example there is a game program, called g, which you wish to run. You of course type

% g

at the UNIX prompt. Remember, you are running some UNIX user shell, let's say tcsh. The latter reads what you have typed, "g", and then must start the execution of g, which it does by calling execve() on g.

p.107:

Both terms used by the author here, pseudo-op and directive, are in common usage. They are synonymous.

p.110:

The author's claim that the assembler stores a 2 in x is not quite correct. What happens is that the assembler makes a note in the .o file which says that when the program is loaded into memory for execution, the loader should put a 2 in the memory location assigned to x.

p.113:

Add to what the author has said, just for your own clarity: "POP BX pops two bytes off the stack and puts them in BX."

p.114:

Saving register values on the stack as shown here is sometimes done, but it should be pointed out that it is better to save them in other registers if possible. Remember, the stack is in memory, which is very slow to access, and thus the code at the top of this page consists of 4 slow operations.

p.115:

The author mentions that the CALL instruction will first push onto the stack the address of the instruction following the CALL, to enable resuming execution at that instruction once the subroutine finishes. Note that the address of that instruction is already in the PC anyway; thus the circuitry will simply push the PC onto the stack.

p.116:

This is an artificial, contrived example. The DIV instruction could be used instead, since it does provide the remainder ("mod" value).

p.118:

Here is a bit more on Sec. 7.3.1:

Suppose we have two assembly-language modules, x.s and y.s, whose corresponding .o files we will link together using ld into an executable file. Suppose in x.s we have an instruction labeled W, and that we refer to W from within y.s, say with an instruction JMP W. At the time y.s is assembled, the assembler will not know how far away W is from the JMP instruction (or, more precisely, the instruction following the JMP). Thus the assembler will leave a note to ld in y.o, saying that ld will need to fill in the jump-target distance in the code for the JMP instruction. But unless we take positive action, ld will have no idea as to where W is, since x.o will not contain that information. So, we use the globl directive in x.s to tell the assembler to make a note in x.o as to the location of W.

p.119:

It is important that you picture the stack before, during and after the execution of the subroutine.

For example, just before the PUSH ECX instruction executes, the stack will look like this:

pushed return address (address of instruction following the CALL)
pushed parameter
the stack contents existing before the parameter push and CALL

The first instruction in the subroutine, PUSH ECX, results in the stack looking like this:

pushed value of ECX
pushed return address (address of instruction following the CALL)
pushed parameter
the stack contents existing before the parameter push and CALL

You can see why we need the +8 in the instruction MOV ECX,[ESP+8]: ESP points to the pushed value of ECX; ESP+4 points to the pushed return address; and ESP+8 points to the pushed parameter, which is what we want to access.

By the way, the idea of interfacing an assembly-language subroutine to a C program is very important. The typical usage is in a program which is too large and complex to write comfortably in assembly language, but which does need assembly language in some sections, e.g. to access the hardware. A good example is Linux itself. It is mostly written in C, but the device drivers (the routines which access the keyboard, mouse, monitor, disk drives, etc.) are written at least partly in assembly language.

p.121:

Again, you should draw pictures of the stack at various stages here, in order to fully understand what is occurring.

p.122:

There is a huge difference between a "protected" platform like UNIX/Linux or Windows NT, and an "unprotected" one like Windows 98. Here are some examples:

On Windows 98 I could type something like DEL C:\WINDOWS\*.*, destroying the entire system, whereas on a system like UNIX this would not be possible except for user named "root".
On Windows 98, I can write a program that writes to anyplace in memory, for instance destroying the OS by writing to the place in memory in which the OS is running. On UNIX I would get a segmentation fault if I tried this.
On Windows, I can write a program which directly accesses the I/O hardware. On UNIX, I could not do this (except as root), and must instead ask the OS to do it for me, via system calls.

The protections provided on systems like UNIX are obviously of enormous importance, and we are fortunate that the author devotes quite a bit of time to this topic. One thing you should always keep in mind is that these protections rely on various hardware mechanisms for their implementation; they would have been impossible, for example, on early Intel chips such as the 8088. On the software side, we need to write the OS in such a way that it will take advantage of such hardware; Linux does, as does Windows NT, but Windows 98 does not.

By the way, the author's phrasing might lead some readers to think that he meant that only Linux uses paging for "economizing on memory." Actually, that is a major use of any paging system.

p.124:

The term "hardwire routine" is vague here. Keep in mind that the author is referring to the hardware (though in some systems, the OS may do some of the register saving).

p.126:

The author implies that paging was invented after UNIX, which is not true. Paging pre-dated UNIX by many years. However, since the machines on which UNIX was first developed (PDP-11s from the Digital Equipment Corp.) did not have paging hardware, the first versions of UNIX did not do paging.

p.127:

The kernel is basically the OS, or at the very least the part of the OS which manages tasks, deciding for example which task to run next.

p.128:

The sentence "It is also the reason..." is strange. In previous few sentences, the author was correctly pointing out that Linux uses LESS memory than do other OSs, and yet this sentence is at odds with that notion.

p.129:

The use of the word "must" in the sentence "Since the translation process..." is too strong; other approaches are used in other machines, and the "must" part of the sentence does not even relate to the first part of the sentence. Here is what is really going on:

(Before going any further, you should read my brief introduction to operating systems. If you have already read it, you should review it.)

In most machines, each process has just one page table. Fig. 8-7 then just has two blocks, "Index into Page Table" and "Index into Page." Keep in mind that the page table is in memory. It may occupy a very large amount of memory, leaving little or no space for the running process which it is serving! Thus we "page the page table": We store just part of the page table in memory, the rest on disk, and whenever we need a part which is on disk, we bring it into memory.

There are various ways of doing this, most of which are too complex to describe here. In the Intel case, though, what we do is break up the big page table into many small ones, and then have a page table table which stores information about those small page tables. For each page table, there will be an entry in the page table table which states whether the given table is in memory, and if so, where (and if not, then where on disk).