# Technical Overview

One of the reasons why C has become so popular is that it allows the programmer to do many types of machine-level applications without resorting to using machine language as the vehicle for programming. One example of that is C's ability to work with memory addresses. Another example, or class of examples, consists of C constructs for bit-level operations, which will be described here.

## AND and OR Operators

C includes the bitwise-AND operator & and the bitwise-OR operator |. To AND two bits together, one merely multiplies them: 0 AND 0 = 0, 1 AND 0 = 0, 0 AND 1 = 0, 1 AND 1 = 1. To OR two bits together, we just add them, except that 1 OR 1 is equal to 1, not 2: 0 OR 0 = 0, 1 OR 0 = 1, 0 or 1 = 1, 1 or 1 = 1.

Note that

• AND-ing by a 1 simply copies the bit value

• OR-ing by a 0 simply copies the bit value

• AND-ing by a 0 results in a 0, not matter what the original bit value was

• OR-ing by a 1 results in a 1, not matter what the original bit value was

The AND operator is useful for

(a)
putting 0s in certain bit positions of a variable, while retaining whatever original values were in the other bit positions

(b)
testing whether certain bit positions within a variable contain 1s or 0s

In the next few examples, suppose the variable X is of type int.

As an example of (a), suppose we wish to put 0s in Bit Positions 3 and 2 (the rightmost bit position being Bit Position 0), and suppose we have a machine/compiler combination which implements int variables as 4-byte quantities (again, this is typical for most workstations today). We would accomplish our goal with the statement

`X &= 0xfffffff3;`

Here is why: First, note that

`X &= 0xfffffff3`

is just a short form of

`X = X & 0xfffffff3`

(like X += 5 vs. X = X + 5). So we are AND-ing 0xfffffff3 with X and putting the result back in X. The constant 0xfffffff3 is, in bit form,

`11111111111111111111111111110011`

Most of the bits here are 1s, and since, as mentioned earlier, AND-ing by a 1 produces no change, most bits in X will not change. Only the bits at Bit Positions 3 and 2 will potentially change: They will change to 0s if they were 1s (or stay at 0s if they were 0s).

For example, after executing the statements

```X = 29;
X &= 0xfffffff3;```

X will have the value 17 (try this yourself on the machine, and make sure you understand).

By the way, the value to be AND-ed with X, in this case 0xfffffff3, is called a mask. To see why, think of what happens when you wish to paint the walls of a room. You would like to cover up the electrical sockets so that they don't get painted; you can do this with masking tape. Well, in the example above, we wished to ``cover up'' all the bits except those at Bit Positions 3 and 2, so we AND-ed them with 1s, so that they would not change.

As an example of (b), suppose we wish to set Z = 12 if there is a 1 in Bit Position 5 of X. We could do this with the statement

`if (X & 0x00000020) Z = 12;`

Here is why this statement accomplishes this goal: The mask 0x00000020 has 0s everywhere except in Bit Position 5, where there is a 1. Recall from above that AND-ing with a 0 produces a 0, no matter what value it is AND-ed with, while AND-ing with a 1 results in copying the bit value. So the quantity X & 0x00000020 will have 0s everywhere except at Bit Position 5, where there will be a copy of X's original Bit 5. If the latter is a 1, then X & 0x00000020 will be nonzero (0x00000010, to be specific), and since any nonzero value is condidered `true', Z = 12 will be executed.

OR-ing is used to put 1s at specific bit positions. You should verify, and make sure you understand, that after the statements

```X = 29;
X |= 0x00000020;```

are executed, X will be equal to 61.

## Shift Operators

The operators << and >> shift the bits in a variable toward the left or toward the right, respectively. Suppose for example that X and Y are of type int, and we execute

```X = 5;
Y = X << 3;```

Then (say on the CAE machines) X will be, in bit form,

`00000000000000000000000000000101`

All these bits will be shifted left by 3 positions, and the result placed in Y. The latter will now be

`00000000000000000000000000101000`

which has the value 40.

footnote: Note that since we are tacking three 0s on the right end, and we are working in base-2, we are in effect multiplying by . (In base-10, tacking on a 0 on the right does a multiplication by 10, e.g. 52 to 520, so in base-2, appending a 0 on the right does a multiplication by 2.) Thus since X was 5, Y will be 40.

Note that the bits on the left end disappear. This may create problems. For example, shifting may turn a positive number into a negative one, or vice versa, since the leftmost bit of a number tells its sign. This presents no problem, though, for variables of type unsigned, so shift operations are typically done on variables of that type, not of type int.

## Bit Fields

These arise in a special type of struct, which allows one to assign variable names to groups of bits within a variable. We will not go into this here.

# Example

Below we have a database example as an application of bit operators.

Many databases are extremely large. Just imagine, for example, how much computer space--either in memory, during program execution, or on disk, for permanent storage, or a combination of both--is required to store the records for all of the Bank of America's credit-card holders. One way to solve this problem is to ``pack'' the data; the Pascal language actually includes a provision for this, the packed array construct.

footnote: However, many Pascal compilers simply ignore this construct, and implement the array without packing.

C does not have this feature, but its bit-level operators can easily be used to implement the equivalent of it. The program below does this.

```   1  Script started on Tue May 19 18:42:41 1992
2  heather% cc -g Packed.c -lm
3  heather% a.out
5  n
6  enter MaxN
7  3
8  enter command
9  e
10  enter ItemNum
11  7
12  enter Datum
13  3
14  enter command
15  e
16  enter ItemNum
17  5
18  enter Datum
19  2
20  enter command
21  e
22  enter ItemNum
23  8
24  enter Datum
25  2
26  enter command
27  v
28  enter output file name
29  gy
30  enter command
31  q
32  heather% !!
33  a.out
35  y
36  enter file name
37  gy
38  enter command
39  d
40  enter ItemNum
41  5
42  2
43  enter command
44  d
45  enter ItemNum
46  8
47  2
48  enter command
49  q
50  heather% cat Packed.c
51
52
53  /* example of how C's bit-level operations can be used to implement
54     packing in the Pascal sense
55
56     data consist of integers in the range 0..MaxN; each datum is
57     stored in a FieldSize-bit subset of an int variable (actually,
58     a variable of type `unsigned int', though we will just say `int'
59     in the rest of this discussion); there are IntSize/FieldSize such
60     fields per int variable, where IntSize = sizeof(int); FieldSize
61     is equal to log(MaxN+1), where the log is base-2; it is assumed
62     that FieldSize evenly divides IntSize
63
64     for example, on Sun's and most other current workstations,
65     int variables are stored in 32-bit words; suppose we wish to
66     store integers in the range 0..3; these can be stored in 2-bit
67     fields (0 = 00, 1 = 01, 2 = 10, 3 = 11); thus we can get 32/2
68     = 16 such fields into one int variable, and thus compress
69     storage requirements (either in memory or in a disk file) by
70     a factor of 16, a big savings; for example, instead of needing
71     an array of 80,000 elements,
72
73        int X;
74
75     we could fit all 80,000 data items into an array of 5,000 int
76     variables, i.e.
77
78        int DataArray;
79
80     DataArray would contain what would have been (if we had not
81     packed) X,X,...,X; DataArray would contain what
82     would have been X,X,...,X31]; etc.; note the bit positions:
83     X would be in Bits 31 and 30 of DataArray, X would be in
84     Bits 29 and 28, etc. (calling the leftmost bit Bit 31 and the
85     rightmost bit Bit 0) */
86
87
88  #define MaxArrayLength 100
89  #define MaxIntSize 100
90
91
92  #include <fcntl.h>
93  #include <math.h>
94
95
96  unsigned DataArray[MaxArrayLength],  /* array of packed data */
97           ItemNum,  /* index the data item would have in X if
98                        packing were not used */
99           DataArrayIndex,  /* index of the item within DataArray */
100           DataArrayStartBit,  /* bit position of the item within the
101                                  DataArray element */
102           Datum,
103           ZeroOutA[MaxIntSize],
104           ZeroOutB[MaxIntSize],
105           All1s,
106           MaxN,  /* upper bound for the value of a datum */
107           FieldSize,  /* number of bits needed to store a datum */
108           IntSize,  /* sizeof(int) */
109           NFields;  /* number of fields per DataArray element */
110
111
112  char Command;  /* d = display data item
113                    e = enter new data item
114                    q = exit program
115                    v = save to file */
116
117
118  InitZeroOutValues()
119
120  {  unsigned Tmp = MaxN;
121     int BitPos,I;
122
123     BitPos = FieldSize - 1;
124     for (I = 0; I < NFields; I++)  {
125        ZeroOutB[BitPos] = Tmp;
126        ZeroOutA[BitPos] = All1s - ZeroOutB[BitPos];
127        BitPos += FieldSize;
128        Tmp <<= FieldSize;
129     }
130  }
131
132
134
135  {  int FD;  char InFileName;
136
137     printf("enter file name\n");
138     scanf("%s",InFileName);
139     FD = open(InFileName,O_RDONLY);
142  }
143
144
145  Init()
146
147  {  IntSize = sizeof(int);
148     All1s = -1;
150     scanf("%c",&Command);
151     if (Command == 'y') ReadFile();
152     else  {
153        printf("enter MaxN\n");
154        scanf("%u",&MaxN);
155     }
156     FieldSize = (int) log2((double) (MaxN+1));
157     NFields = IntSize*8 / FieldSize;
158     InitZeroOutValues();
159  }
160
161
162  Insert()  /* inserts Datum at the position in DataArray appropriate
163               for the given value of ItemNum */
164
165  {  DataArrayIndex = ItemNum / NFields;
166     DataArrayStartBit = (NFields-ItemNum % NFields) * FieldSize - 1;
167
168     /* in inserting Datum into its bit field, we must take care that
169        none of the other bit fields in this element of DataArray are
170        affected; the plan is as follows:
171
172           (a) use the bitwise-AND operator & to zero-out the field in
173               which Datum will be put;
174
175           (b) use the shift-left operator << to shift Datum the proper
176               number of bit positions to make it aligned with the zeroed-
177               out field;
178
179           (c) use the bitwise-OR operator | to insert the item without
180               affecting the other fields
181
182     */
183
184     /* (a) */
185     DataArray[DataArrayIndex] &= ZeroOutA[DataArrayStartBit];
186
187     /* (b) */
188     Datum <<= DataArrayStartBit - (FieldSize - 1);
189
190     /* (c) */
191     DataArray[DataArrayIndex] |= Datum;
192  }
193
194
195  GetCommand()
196
197  {  printf("enter command\n");
198     scanf("%c",&Command);
199     /* might be a leftover end-of-line character from the
200        previous action; if so, then read the next character
201        in the input stream */
202     if (Command == '\n') scanf("%c",&Command);
203  }
204
205
206  InputItemNum()
207
208  {  printf("enter ItemNum\n");
209     scanf("%u",&ItemNum);
210  }
211
212
213  InputDatum()
214
215  {  printf("enter Datum\n");
216     scanf("%u",&Datum);
217  }
218
219
220  unsigned GetItem()  /* displays to the screen the element which would
221                         be at position ItemNum of X if we had not
222                         packed */
223
224  {  unsigned Tmp;
225
226     DataArrayIndex = ItemNum / NFields;
227     DataArrayStartBit = (NFields-ItemNum % NFields) * FieldSize - 1;
228
229     /* the plan:
230
231        (a) copy the DataArray element to Tmp
232
233        (b) zero out the fields in Tmp other the one containing the
234            item of interest
235
236        (c) shift Tmp to the right to produce the item of interest
237
238     */
239
240     /* (a) */
241     Tmp = DataArray[DataArrayIndex];
242
243     /* (b) */
244     Tmp &= ZeroOutB[DataArrayStartBit];
245
246     /* (c) */
247     return Tmp >> (DataArrayStartBit - (FieldSize - 1));
248  }
249
250
251  SaveToFile()
252
253  {  int FD,I;  char OutFileName;
254
255     printf("enter output file name\n");
256     scanf("%s",OutFileName);
257     FD = open(OutFileName,O_WRONLY|O_CREAT,0700);
258     write(FD,&MaxN,IntSize);
259     write(FD,DataArray,MaxArrayLength);
260  }
261
262
263  main()
264
265  {  Init();
266     while (1)  {
267        GetCommand();
268        switch (Command)  {
269           case 'd':  InputItemNum(); printf("%u\n",GetItem()); break;
270           case 'e':  InputItemNum(); InputDatum(); Insert(); break;
271           case 'q':  exit();
272           case 'v':  SaveToFile();
273        }
274     }
275  }
276
277  heather% e
278  heather%
279  script done on Tue May 19 18:44:16 1992```

Analysis:

First read the comments, Lines 53-85. They describe the general goal of the program. The program inputs values and stores them in a packed array. The user can ask to have them displayed, or have them written--again in packed form--to a file. The program is demonstrated on Lines 3-49. For example, on Lines 11 and 13, the user is asking to store the value 3 as the seventh item in the database, while on Lines 23 and 25, he/she is asking to store 2 into the eighth item. On Lines 27 and 29, the user is asking that the database be saved into the file `gy'.

By the way, note the `-lm' in the command line for the compile (Line 2). This tells the compiler (actually, the linker, ld, which is called by the compiler) to look in C's math library (/usr/lib/libm.a) to find any functions it can't find elsewhere. In this case, we are calling the base-2-logarithm function, log2 on Line 156, which makes the `-lm' necessary:

footnote: Some C compilers may not have this function available.
The compiler will get the machine-language code for that function from /usr/lib/libm.a, and link it in with the machine-language code which the compiler produces from our C source file here.

Note also Line 93 in this regard. We are telling the compiler to include the C source code file /usr/include/math.h (the indicates that the file is in the directory /usr/include; otherwise, we would have used quotation marks on the file name). That file contains definitions related to the math library; in our case here, it contains line stating that log2 is of type double (double-precision version of float):

`extern double log2();`

Without including this file, or writing this line ourselves, the compiler would mistakenly think that log2 was of the default type, int, and would likely produce erroneous machine code.

Note that unsigned type is used instead of int, for the reasons given above. Accordingly, %u format is used instead of %d, e.g. in Line 269.

Let's take a look at how the Insert() function works.

footnote: The other major function, GetItem(), is similar and will not be discussed here.
To illustrate things, suppose on Line 49 we had typed `e', to enter a new datum into the array: Say we wish to enter the value 1 into the slot for ItemNum 5, i.e. we want to put 1 into the fifth item in the database.
footnote: For example, this could be the slot for, say, Customer Number 5 in some database application.

First, on Lines 165 and 166, we find out which element of DataArray, and at which bit position within that element, this slot is. The answer is DataArray and Bit Position 21. This makes sense: Since MaxN is 3, i.e. we are storing numbers in the range 0..3, we need only two bits per slot. That means that DataArray contains database items 0, 1, 2, ..., 13, 14 and 15, starting at Bit Positions 31, 29, 27, ..., 5, 3 and 1, respectively; in particular, Item 5 will be in this element of DataArray, at Bit Positions 21 and 20.

We must first zero-out this slot (Line 185). This will erase the old value in the slot (technically, it will change the value to 0), in preparation for using the OR operator to add in our new datum there (Line 191). This zeroing will be done by AND-ing DataArray with the mask

11111111110011111111111111111111

which is stored in ZeroOut, having been computed earlier in the function InitZeroOutValues() (Lines 118-130). Note again that it is crucial that the other slots in DataArray be left intact, which will be the case, due to all those 1s in the other bit positions in ZeroOut. As a result, DataArray, which had previously contained

00000000000000110000000000000000

(the underlined 11 is the 3 in the slot for ItemNum = 7), will now contain

00000000000100110000000000000000

Next, we shift the new datum, 1, to the left so as to line it up with its slot (Line 188). In other words, Datum had been

00000000000000000000000000000001

and now we will shift it to the left to make it equal to

00000000000100000000000000000000

Now we are ready to OR it with DataArray (Line 187), which has the effect of putting the new value, 1, into the requested slot.

The function InitZeroOutValues() uses techniques like those of Insert() and GetItem(). Note that the variable All1s (Line 122) is well-named: It was set to -1 (Line 144), which does indeed consist of a long string of only 1-bits.

Norm Matloff
Wed Nov 8 17:28:31 PST 1995