Bit-Level Operations in C

Technical Overview

One of the reasons why C has become so popular is that it allows the programmer to do many types of machine-level applications without resorting to using machine language as the vehicle for programming. One example of that is C's ability to work with memory addresses. Another example, or class of examples, consists of C constructs for bit-level operations, which will be described here.

AND and OR Operators

C includes the bitwise-AND operator & and the bitwise-OR operator |. To AND two bits together, one merely multiplies them: 0 AND 0 = 0, 1 AND 0 = 0, 0 AND 1 = 0, 1 AND 1 = 1. To OR two bits together, we just add them, except that 1 OR 1 is equal to 1, not 2: 0 OR 0 = 0, 1 OR 0 = 1, 0 or 1 = 1, 1 or 1 = 1.

Note that

The AND operator is useful for

(a)
putting 0s in certain bit positions of a variable, while retaining whatever original values were in the other bit positions

(b)
testing whether certain bit positions within a variable contain 1s or 0s

In the next few examples, suppose the variable X is of type int.

As an example of (a), suppose we wish to put 0s in Bit Positions 3 and 2 (the rightmost bit position being Bit Position 0), and suppose we have a machine/compiler combination which implements int variables as 4-byte quantities (again, this is typical for most workstations today). We would accomplish our goal with the statement

X &= 0xfffffff3;

Here is why: First, note that

X &= 0xfffffff3

is just a short form of

X = X & 0xfffffff3

(like X += 5 vs. X = X + 5). So we are AND-ing 0xfffffff3 with X and putting the result back in X. The constant 0xfffffff3 is, in bit form,

11111111111111111111111111110011

Most of the bits here are 1s, and since, as mentioned earlier, AND-ing by a 1 produces no change, most bits in X will not change. Only the bits at Bit Positions 3 and 2 will potentially change: They will change to 0s if they were 1s (or stay at 0s if they were 0s).

For example, after executing the statements

X = 29;
X &= 0xfffffff3;

X will have the value 17 (try this yourself on the machine, and make sure you understand).

By the way, the value to be AND-ed with X, in this case 0xfffffff3, is called a mask. To see why, think of what happens when you wish to paint the walls of a room. You would like to cover up the electrical sockets so that they don't get painted; you can do this with masking tape. Well, in the example above, we wished to ``cover up'' all the bits except those at Bit Positions 3 and 2, so we AND-ed them with 1s, so that they would not change.

As an example of (b), suppose we wish to set Z = 12 if there is a 1 in Bit Position 5 of X. We could do this with the statement

if (X & 0x00000020) Z = 12;

Here is why this statement accomplishes this goal: The mask 0x00000020 has 0s everywhere except in Bit Position 5, where there is a 1. Recall from above that AND-ing with a 0 produces a 0, no matter what value it is AND-ed with, while AND-ing with a 1 results in copying the bit value. So the quantity X & 0x00000020 will have 0s everywhere except at Bit Position 5, where there will be a copy of X's original Bit 5. If the latter is a 1, then X & 0x00000020 will be nonzero (0x00000010, to be specific), and since any nonzero value is condidered `true', Z = 12 will be executed.

OR-ing is used to put 1s at specific bit positions. You should verify, and make sure you understand, that after the statements

X = 29;
X |= 0x00000020;

are executed, X will be equal to 61.

Shift Operators

The operators << and >> shift the bits in a variable toward the left or toward the right, respectively. Suppose for example that X and Y are of type int, and we execute

X = 5;
Y = X << 3;

Then (say on the CAE machines) X will be, in bit form,

00000000000000000000000000000101

All these bits will be shifted left by 3 positions, and the result placed in Y. The latter will now be

00000000000000000000000000101000

which has the value 40.

footnote: Note that since we are tacking three 0s on the right end, and we are working in base-2, we are in effect multiplying by . (In base-10, tacking on a 0 on the right does a multiplication by 10, e.g. 52 to 520, so in base-2, appending a 0 on the right does a multiplication by 2.) Thus since X was 5, Y will be 40.

Note that the bits on the left end disappear. This may create problems. For example, shifting may turn a positive number into a negative one, or vice versa, since the leftmost bit of a number tells its sign. This presents no problem, though, for variables of type unsigned, so shift operations are typically done on variables of that type, not of type int.

Bit Fields

These arise in a special type of struct, which allows one to assign variable names to groups of bits within a variable. We will not go into this here.

Example

Below we have a database example as an application of bit operators.

Many databases are extremely large. Just imagine, for example, how much computer space--either in memory, during program execution, or on disk, for permanent storage, or a combination of both--is required to store the records for all of the Bank of America's credit-card holders. One way to solve this problem is to ``pack'' the data; the Pascal language actually includes a provision for this, the packed array construct.

footnote: However, many Pascal compilers simply ignore this construct, and implement the array without packing.

C does not have this feature, but its bit-level operators can easily be used to implement the equivalent of it. The program below does this.

   1  Script started on Tue May 19 18:42:41 1992
   2  heather% cc -g Packed.c -lm
   3  heather% a.out
   4  read in a file?
   5  n
   6  enter MaxN
   7  3
   8  enter command
   9  e
  10  enter ItemNum
  11  7
  12  enter Datum
  13  3
  14  enter command
  15  e
  16  enter ItemNum
  17  5
  18  enter Datum
  19  2
  20  enter command
  21  e
  22  enter ItemNum
  23  8
  24  enter Datum
  25  2
  26  enter command
  27  v
  28  enter output file name
  29  gy
  30  enter command
  31  q
  32  heather% !!
  33  a.out
  34  read in a file?
  35  y
  36  enter file name
  37  gy
  38  enter command
  39  d
  40  enter ItemNum
  41  5
  42  2
  43  enter command
  44  d
  45  enter ItemNum
  46  8
  47  2
  48  enter command
  49  q
  50  heather% cat Packed.c
  51  
  52  
  53  /* example of how C's bit-level operations can be used to implement
  54     packing in the Pascal sense 
  55  
  56     data consist of integers in the range 0..MaxN; each datum is 
  57     stored in a FieldSize-bit subset of an int variable (actually,
  58     a variable of type `unsigned int', though we will just say `int'
  59     in the rest of this discussion); there are IntSize/FieldSize such 
  60     fields per int variable, where IntSize = sizeof(int); FieldSize 
  61     is equal to log(MaxN+1), where the log is base-2; it is assumed 
  62     that FieldSize evenly divides IntSize
  63  
  64     for example, on Sun's and most other current workstations,
  65     int variables are stored in 32-bit words; suppose we wish to
  66     store integers in the range 0..3; these can be stored in 2-bit
  67     fields (0 = 00, 1 = 01, 2 = 10, 3 = 11); thus we can get 32/2
  68     = 16 such fields into one int variable, and thus compress
  69     storage requirements (either in memory or in a disk file) by
  70     a factor of 16, a big savings; for example, instead of needing
  71     an array of 80,000 elements, 
  72     
  73        int X[80000];
  74  
  75     we could fit all 80,000 data items into an array of 5,000 int 
  76     variables, i.e.
  77  
  78        int DataArray[5000];
  79  
  80     DataArray[0] would contain what would have been (if we had not
  81     packed) X[0],X[1],...,X[15]; DataArray[1] would contain what
  82     would have been X[16],X[17],...,X31]; etc.; note the bit positions:
  83     X[0] would be in Bits 31 and 30 of DataArray[0], X[1] would be in
  84     Bits 29 and 28, etc. (calling the leftmost bit Bit 31 and the
  85     rightmost bit Bit 0) */
  86  
  87  
  88  #define MaxArrayLength 100
  89  #define MaxIntSize 100
  90  
  91  
  92  #include <fcntl.h>
  93  #include <math.h>
  94  
  95  
  96  unsigned DataArray[MaxArrayLength],  /* array of packed data */
  97           ItemNum,  /* index the data item would have in X if 
  98                        packing were not used */
  99           DataArrayIndex,  /* index of the item within DataArray */
 100           DataArrayStartBit,  /* bit position of the item within the
 101                                  DataArray element */
 102           Datum,  
 103           ZeroOutA[MaxIntSize],
 104           ZeroOutB[MaxIntSize],
 105           All1s,
 106           MaxN,  /* upper bound for the value of a datum */
 107           FieldSize,  /* number of bits needed to store a datum */
 108           IntSize,  /* sizeof(int) */
 109           NFields;  /* number of fields per DataArray element */
 110  
 111  
 112  char Command;  /* d = display data item
 113                    e = enter new data item
 114                    q = exit program
 115                    v = save to file */
 116  
 117  
 118  InitZeroOutValues()
 119  
 120  {  unsigned Tmp = MaxN;
 121     int BitPos,I;
 122  
 123     BitPos = FieldSize - 1;
 124     for (I = 0; I < NFields; I++)  {
 125        ZeroOutB[BitPos] = Tmp;
 126        ZeroOutA[BitPos] = All1s - ZeroOutB[BitPos];
 127        BitPos += FieldSize;
 128        Tmp <<= FieldSize;
 129     }
 130  }
 131  
 132  
 133  ReadFile()
 134  
 135  {  int FD;  char InFileName[20];
 136  
 137     printf("enter file name\n");
 138     scanf("%s",InFileName);
 139     FD = open(InFileName,O_RDONLY);
 140     read(FD,&MaxN,IntSize);
 141     read(FD,DataArray,MaxArrayLength*IntSize);
 142  }
 143  
 144  
 145  Init()
 146  
 147  {  IntSize = sizeof(int);
 148     All1s = -1;
 149     printf("read in a file?\n");
 150     scanf("%c",&Command);
 151     if (Command == 'y') ReadFile();
 152     else  {
 153        printf("enter MaxN\n");  
 154        scanf("%u",&MaxN);
 155     }
 156     FieldSize = (int) log2((double) (MaxN+1));
 157     NFields = IntSize*8 / FieldSize;
 158     InitZeroOutValues();
 159  }
 160  
 161  
 162  Insert()  /* inserts Datum at the position in DataArray appropriate
 163               for the given value of ItemNum */
 164  
 165  {  DataArrayIndex = ItemNum / NFields;
 166     DataArrayStartBit = (NFields-ItemNum % NFields) * FieldSize - 1;
 167  
 168     /* in inserting Datum into its bit field, we must take care that
 169        none of the other bit fields in this element of DataArray are
 170        affected; the plan is as follows:  
 171        
 172           (a) use the bitwise-AND operator & to zero-out the field in 
 173               which Datum will be put;
 174        
 175           (b) use the shift-left operator << to shift Datum the proper 
 176               number of bit positions to make it aligned with the zeroed-
 177               out field; 
 178               
 179           (c) use the bitwise-OR operator | to insert the item without 
 180               affecting the other fields 
 181               
 182     */
 183  
 184     /* (a) */
 185     DataArray[DataArrayIndex] &= ZeroOutA[DataArrayStartBit];
 186  
 187     /* (b) */
 188     Datum <<= DataArrayStartBit - (FieldSize - 1);
 189  
 190     /* (c) */
 191     DataArray[DataArrayIndex] |= Datum;
 192  }
 193  
 194  
 195  GetCommand()
 196  
 197  {  printf("enter command\n");
 198     scanf("%c",&Command);
 199     /* might be a leftover end-of-line character from the
 200        previous action; if so, then read the next character
 201        in the input stream */
 202     if (Command == '\n') scanf("%c",&Command);
 203  }
 204  
 205  
 206  InputItemNum()
 207  
 208  {  printf("enter ItemNum\n");
 209     scanf("%u",&ItemNum);
 210  }
 211  
 212  
 213  InputDatum()
 214  
 215  {  printf("enter Datum\n");
 216     scanf("%u",&Datum);
 217  }
 218  
 219  
 220  unsigned GetItem()  /* displays to the screen the element which would 
 221                         be at position ItemNum of X if we had not 
 222                         packed */
 223  
 224  {  unsigned Tmp;
 225  
 226     DataArrayIndex = ItemNum / NFields;
 227     DataArrayStartBit = (NFields-ItemNum % NFields) * FieldSize - 1;
 228  
 229     /* the plan:
 230  
 231        (a) copy the DataArray element to Tmp
 232  
 233        (b) zero out the fields in Tmp other the one containing the
 234            item of interest
 235  
 236        (c) shift Tmp to the right to produce the item of interest
 237  
 238     */
 239  
 240     /* (a) */
 241     Tmp = DataArray[DataArrayIndex];
 242  
 243     /* (b) */
 244     Tmp &= ZeroOutB[DataArrayStartBit];
 245  
 246     /* (c) */
 247     return Tmp >> (DataArrayStartBit - (FieldSize - 1));
 248  }
 249  
 250  
 251  SaveToFile()
 252  
 253  {  int FD,I;  char OutFileName[20];
 254  
 255     printf("enter output file name\n");
 256     scanf("%s",OutFileName);
 257     FD = open(OutFileName,O_WRONLY|O_CREAT,0700);
 258     write(FD,&MaxN,IntSize);
 259     write(FD,DataArray,MaxArrayLength);
 260  }
 261  
 262  
 263  main()
 264  
 265  {  Init();
 266     while (1)  {
 267        GetCommand();
 268        switch (Command)  {
 269           case 'd':  InputItemNum(); printf("%u\n",GetItem()); break;
 270           case 'e':  InputItemNum(); InputDatum(); Insert(); break;
 271           case 'q':  exit();
 272           case 'v':  SaveToFile();
 273        }
 274     }
 275  }
 276  
 277  heather% e
 278  heather% 
 279  script done on Tue May 19 18:44:16 1992

Analysis:

First read the comments, Lines 53-85. They describe the general goal of the program. The program inputs values and stores them in a packed array. The user can ask to have them displayed, or have them written--again in packed form--to a file. The program is demonstrated on Lines 3-49. For example, on Lines 11 and 13, the user is asking to store the value 3 as the seventh item in the database, while on Lines 23 and 25, he/she is asking to store 2 into the eighth item. On Lines 27 and 29, the user is asking that the database be saved into the file `gy'.

By the way, note the `-lm' in the command line for the compile (Line 2). This tells the compiler (actually, the linker, ld, which is called by the compiler) to look in C's math library (/usr/lib/libm.a) to find any functions it can't find elsewhere. In this case, we are calling the base-2-logarithm function, log2 on Line 156, which makes the `-lm' necessary:

footnote: Some C compilers may not have this function available.
The compiler will get the machine-language code for that function from /usr/lib/libm.a, and link it in with the machine-language code which the compiler produces from our C source file here.

Note also Line 93 in this regard. We are telling the compiler to include the C source code file /usr/include/math.h (the indicates that the file is in the directory /usr/include; otherwise, we would have used quotation marks on the file name). That file contains definitions related to the math library; in our case here, it contains line stating that log2 is of type double (double-precision version of float):

extern double log2();

Without including this file, or writing this line ourselves, the compiler would mistakenly think that log2 was of the default type, int, and would likely produce erroneous machine code.

Note that unsigned type is used instead of int, for the reasons given above. Accordingly, %u format is used instead of %d, e.g. in Line 269.

Let's take a look at how the Insert() function works.

footnote: The other major function, GetItem(), is similar and will not be discussed here.
To illustrate things, suppose on Line 49 we had typed `e', to enter a new datum into the array: Say we wish to enter the value 1 into the slot for ItemNum 5, i.e. we want to put 1 into the fifth item in the database.
footnote: For example, this could be the slot for, say, Customer Number 5 in some database application.

First, on Lines 165 and 166, we find out which element of DataArray, and at which bit position within that element, this slot is. The answer is DataArray[0] and Bit Position 21. This makes sense: Since MaxN is 3, i.e. we are storing numbers in the range 0..3, we need only two bits per slot. That means that DataArray[0] contains database items 0, 1, 2, ..., 13, 14 and 15, starting at Bit Positions 31, 29, 27, ..., 5, 3 and 1, respectively; in particular, Item 5 will be in this element of DataArray, at Bit Positions 21 and 20.

We must first zero-out this slot (Line 185). This will erase the old value in the slot (technically, it will change the value to 0), in preparation for using the OR operator to add in our new datum there (Line 191). This zeroing will be done by AND-ing DataArray[0] with the mask

11111111110011111111111111111111

which is stored in ZeroOut[21], having been computed earlier in the function InitZeroOutValues() (Lines 118-130). Note again that it is crucial that the other slots in DataArray[0] be left intact, which will be the case, due to all those 1s in the other bit positions in ZeroOut[1]. As a result, DataArray[0], which had previously contained

00000000000000110000000000000000

(the underlined 11 is the 3 in the slot for ItemNum = 7), will now contain

00000000000100110000000000000000

Next, we shift the new datum, 1, to the left so as to line it up with its slot (Line 188). In other words, Datum had been

00000000000000000000000000000001

and now we will shift it to the left to make it equal to

00000000000100000000000000000000

Now we are ready to OR it with DataArray[0] (Line 187), which has the effect of putting the new value, 1, into the requested slot.

The function InitZeroOutValues() uses techniques like those of Insert() and GetItem(). Note that the variable All1s (Line 122) is well-named: It was set to -1 (Line 144), which does indeed consist of a long string of only 1-bits.



Norm Matloff
Wed Nov 8 17:28:31 PST 1995