

\documentstyle{article}

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.8in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.12in}

\begin{document}

\newcommand{\bfs}[1]{{\bf #1}}

\begin{center}
{\LARGE\bf 

Binary Representation and Storage of Information

}
\end{center}

\bigskip

As you may know already, all information---program variables,
disk files, etc.---is stored in binary form, i.e. as 0s and 1s.
This document will give you a brief summary of this material.

\section{Bit-String Terminology}

A \bfs{bit} is a digit which is either 0 or 1.
A \bfs{byte} is a string of 8 bits.

A more compact way for us humans to write down long bit strings 
is to use \bfs{hex} form (hex is just \underline{notation};
the bit string still consists of 0s and 1s inside the machine).  
The bit string is partitioned into groups of 4 bits each.  Each 
group is then treated as a 4-bit, base-2 number (even if the bit 
string itself is not representing a number).  For example, the
bit string 1101 would be treated as the number 13, since
$1 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 = 13$.
Then each group is given a one-character name, one of the
characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, meaning
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, respectively.

For example, the bit string 1001110010101110 would be partitioned 
as 1001 1100 1010 1110, and thus written in compact form in the C 
language as 0x9cae (0x denotes hex numbers in C).

Note that one byte contains two hex digits.

You can view the contents of a file in hex form by using the
-x option in the Unix \bfs{od} command.  It will show the
hex contents of each byte in the file.

For example, let us look at the first two lines of an executable
file a.out:\footnote{Notice the use of a Unix \bfs{pipe} here,
as indicated by the $|$ symbol.  Ordinarily, the output of the
\bfs{od} command would go to our screen, but instead we are
piping it to the standard input of the \bfs{head} command,
which in this case is a Unix ``filter'' which shows us the
first 2 lines of the input.}

\begin{verbatim}
heather% od -x a.out | head -2
0000000  8103 010b 0000 2000 0000 2000 0000 0000
0000020  0000 012c 0000 2020 0000 0000 0000 0000
\end{verbatim}

Of course, all of this is not very meaningful unless we know
something about the machine language of this machine; all we
can say is that the first byte in a.out is 10000001, the second
byte is 00000011, etc.

\section{Storing Program Variables in Bit Form}

\subsection{Variables of Type int}

For variables of \bfs{int} type, the base-2 form of the number
is stored.  The number of bits used depends both on the machine
and the compiler.  For the Unix compiler on Sun SPARCstations,
32-bit strings are used; note that this is 8 hex digits.

For example, if a C program source file contains the lines

\begin{verbatim}
int G;

G = 25;
\end{verbatim}

and is compiled by the Unix compiler on a Sun SPARCstation, then
the internal representation of G in memory will be 

\begin{verbatim}
00000000000000000000000000011001
\end{verbatim}

 or in hex notation 0x00000019.

For negative numbers, the \bfs{two's complement} convention is
used on most modern machines\footnote{Again, IBM mainframes are
the main exception.}.  Under this system, the bit representation
for the negative number -n is obtained by ``winding backwards'' n
times from the bit representation for 0.  

On a 32-bit machine, for example, to get the bit representation
for -1, we wind backwards once from 

\begin{verbatim}
00000000000000000000000000000000
\end{verbatim}

yielding

\begin{verbatim}
11111111111111111111111111111111
\end{verbatim}

To represent -2 we wind backwards twice from 0, yielding

\begin{verbatim}
11111111111111111111111111111110
\end{verbatim}

Similarly, the representation for -3, -4 and -5 would be

\begin{verbatim}
11111111111111111111111111111101
\end{verbatim}

\begin{verbatim}
11111111111111111111111111111100
\end{verbatim}

\begin{verbatim}
11111111111111111111111111111011
\end{verbatim}

and so on.

On a 32-bit machine, there are $2^{32}$ (about 4 billion) possible
bit patterns.  In the manner shown above, the convention is to use
half of these for positive numbers and half for negative numbers;
in this way, the range of an \bfs{int} variable will be roughly from 
-2 billion to +2 billion.  Note that all the positive numbers will
have their leftmost bits being 0, while all the negative numbers
will have a 1 in that leftmost position.

\subsection{Variables of Type char}

Variables of type \bfs{char} are handled in the following manner.  
Each possible character is given a number.  The number 65 (in hex,
0x41), for example, means the character `A'.  This is under 
the convention called  ASCII, used by the vast majority of modern 
computers\footnote{Some hardware (keyboards) and software (compilers, 
etc.) made for use on old machines such as IBM mainframes use 
another convention, called  EBCDIC.}.  You can get a table of all ASCII
codes, in hex form, by typing \bfs{man ascii}; a summary of it is
given below.

Here is an illustration:

\begin{verbatim}
heather% cat gy
abc
heather% od -x gy
0000000  6162 630a
\end{verbatim}

0x61 is the ASCII code for `a', 0x62 for `b', and 0x63 for `c';
0x0a is the ASCII code for the end-of-line character.

Note that ASCII codes range from 0x00 to 0x7f (i.e. 0 to 127).  Any
byte in the range 0x80 to 0xff---i.e. any byte whose leading bit is
a 1---may have unpredictable effects when used with type \bfs{char}, 
depending on the compiler.  Some values may actually get changed.  

Note, for example, the \bfs{mail} program may alter bytes in this range, 
thus transmitting incorrectly, necessitating the use of \bfs{uuencode} 
and \bfs{uudecode}.  As an illustration of this, I compiled the simple 
program

\begin{verbatim} 
int i;

main()

{ i = 1; }
\end{verbatim}

producing the file a.out.  I then mailed the file to myself, saved the
mailed copy as b.out, then removed the lines of mail header information
(``From:'', ``Date:'', etc.), and then compared the first few bytes of
each file as follows:

\begin{verbatim}
heather% od -x a.out | head -2
0000000  0162 0005 33a0 2b1c 2000 0000 0060 0000
0000020  0038 0007 010b 020a 1000 0000 1000 0000
heather% ^a^b
od -x b.out | head -2
0000000  0162 2005 1c33 202b 3860 0b07 0a01 1002
0000020  2010 4001 1040 1010 7f7e 3307 0920 2e10
\end{verbatim}

As you can see, the version I received in the mail was garbled.  I got
an execution error when I tried to run it.

In \bfs{printf}, \bfs{scanf}, etc., we can get hex format by using
the \%x format, just as we use \%c for character format and \%d
for integer format.

\section{Memory Locations}

Each memory location in the computer has an identification number,
called an \bfs{address}.  In most current machines, 
this extends down to the byte level; each byte has its own address.
The first byte in memory is Byte 0, then Byte 1, then Byte 2, and
so on.

When the compiler compiles your program, it will decide at what
addresses to put each of your program's variables, and how many
bytes each variable occupies.  If you have to know this information
(and in C applications we often do need to know), we can use the
\& operator to find the address of a variable, and the \bfs{sizeof}
operator to find how many bytes the variable (and all variables of
that same type) takes up.

Suppose, for example, that we have an \bfs{int} variable Y.  The
statement

\begin{verbatim}
printf("%x %d\n",&Y,sizeof(int));
\end{verbatim}

will print out for us the address of Y, and the number of bytes
taken up by any \bfs{int} variable (for this compiler-machine
combination).  Say the output is 

\begin{verbatim}
40b0 4
\end{verbatim}

Then we know that Y occupies Bytes 40b0, 40b1, 40b2 and 04b3.

Components of a complex data type are guaranteed in C to be stored
contiguously.  Thus for example, on a Sun, if we have the declaration

\begin{verbatim}
int Z[5];
\end{verbatim}

and if Z starts at address 5000, then Z[0] will be at 5000, Z[1]
at 5004, Z[2] at 5008, Z[3] at 500c, and Z[4] at 5010.

Similarly, elements within a \bfs{struct} are stored contiguously,
and in the order declared.

\appendix{ASCII Codes} 
\begin{verbatim}
     Hexadecimal - Character

     | 00 NUL| 01 SOH| 02 STX| 03 ETX| 04 EOT| 05 ENQ| 06 ACK| 07 BEL|
     | 08 BS | 09 HT | 0A NL | 0B VT | 0C NP | 0D CR | 0E SO | 0F SI |
     | 10 DLE| 11 DC1| 12 DC2| 13 DC3| 14 DC4| 15 NAK| 16 SYN| 17 ETB|
     | 18 CAN| 19 EM | 1A SUB| 1B ESC| 1C FS | 1D GS | 1E RS | 1F US |
     | 20 SP | 21  ! | 22  " | 23  # | 24  $ | 25  % | 26  & | 27  ' |
     | 28  ( | 29  ) | 2A  * | 2B  + | 2C  , | 2D  - | 2E  . | 2F  / |
     | 30  0 | 31  1 | 32  2 | 33  3 | 34  4 | 35  5 | 36  6 | 37  7 |
     | 38  8 | 39  9 | 3A  : | 3B  ; | 3C  < | 3D  = | 3E  > | 3F  ? |
     | 40  @ | 41  A | 42  B | 43  C | 44  D | 45  E | 46  F | 47  G |
     | 48  H | 49  I | 4A  J | 4B  K | 4C  L | 4D  M | 4E  N | 4F  O |
     | 50  P | 51  Q | 52  R | 53  S | 54  T | 55  U | 56  V | 57  W |
     | 58  X | 59  Y | 5A  Z | 5B  [ | 5C  \ | 5D  ] | 5E  ^ | 5F  _ |
     | 60  ` | 61  a | 62  b | 63  c | 64  d | 65  e | 66  f | 67  g |
     | 68  h | 69  i | 6A  j | 6B  k | 6C  l | 6D  m | 6E  n | 6F  o |
     | 70  p | 71  q | 72  r | 73  s | 74  t | 75  u | 76  v | 77  w |
     | 78  x | 79  y | 7A  z | 7B  { | 7C  | | 7D  } | 7E  ~ | 7F DEL|
\end{verbatim}

\end{document}

