A C struct type may be best introduced by comparison to array types. C array variables, as is the case with arrays in other languages, consist of a collection of variables of the same type. For example, in the declaration
int Z[5];we are declaring a collection of five variables of type int. By contrast,
struct A { int L,M; char C; } T,U;would declare T and U to each consist of a collection of three variables; two of the three are integers, and the third is of type character.
It is important, though to describe this more precisely. What we are really doing above is two things:
We also can do (i) and (ii) separately, e.g. as
1 struct A { 2 int L,M; 3 char C; 4 }; 5 6 struct A T,U;(I have added line numbers for easy reference; of course, these would not be part of the actual .c file.) Here Lines 1-4 define this new type A,1 and Line 6 declares two variables, T and U, of this new type A.2 Note carefully that Lines 1-4 do not result in the compiler arranging for run-time space to be allocated in memory; only Line 6 does that. Put another way, the variables L, M and C don't exist until Line 6 (and then there will be two of each, one for T and one for U).
We can access the individual variables (called fields) within a struct by using the `.' operator. For example, T.M would be the second integer variable within T, and U.C would be the character variable within U.
A pointer variable consists of a memory address.3
Consider the declaration
int X,*Y,*Q;Here X is declared to be of type int, but Y and Q are declared to be of type pointer-to-int, i.e. Y and Q will contain the memory addresses of int variables, maybe X.4 In fact, the latter would be the case for Y if we executed the statement
Y = &X;The `*' operator is used to de-reference a pointer, i.e. to tell what is in the memory location pointed to by the pointer. In the above example, for instance, the statement
printf("%d\n",*Y);would have exactly the same result as would
printf("%d\n",X);If a pointer has the value 0, it is considered in C to point to nothing (even though most machines do have a memory location 0). If you attempt to de-reference that pointer, e.g. you have an expression `*Y' above at a time when Y is 0, you will generate an error message, which on Unix is usually ``segmentation fault.''
We can add and subtract values to/from pointers, but the arithmetic works in a different way. If P is a pointer-to-type-K, then the expression P+R is not necessarily R larger than P; it is R*sizeof(K) larger. In the example with Y above, suppose &X is 0x5000. Then (on 32-bit machines) Y will be 0x5000, and Y+3 will be 0x500c, not 0x5003.
One of the major uses for pointers is to reference structs. For instance, let us extend the above struct example to
struct A { int L,M; char C; } T,U,*G;and suppose we execute the statement
G = &T;Then we could reference the fields within T indirectly via G, by using the - > operator. For example,
printf("%d\n",G->L);would have exactly the same result as
printf("%d\n",T.L);
Below, in Lines 5-248, is an example program. It is quite involved, using structs and pointers in complex--but typical--manners. A script record of a couple of runs of the program follows the program listing, followed in turn by an analysis.
1 Script started on Mon Apr 20 23:12:16 1992 2 heather% cat W* 3 4 5 /* introductory program on structs and pointers; warehouse setting, 6 the user inputs information from a pile of papers which document 7 purchase orders and cancellation orders 8 9 each paper has a time stamp on it, showing the time the paper 10 was written (in the case of a cancellation, the time stamp is 11 the time at which the original order was place) 12 13 the information must be maintained in time stamp sequence, in 14 a linked list, which in this case would be preferable to an 15 array since we don't have to "move everything over" when a 16 cancellation occurs 17 18 initial input can be from a previously-created file if desired; 19 file is in binary format, to save disk space */ 20 21 22 #include <fcntl.h> 23 24 25 #define MaxFileSize 1000 26 27 28 char Command, /* `q' (quit) 29 `p' (purchase) 30 `c' (cancel) 31 `d' (display list) 32 `v' (save to file) */ 33 InFileArray[MaxFileSize]; /* buffer for input file, if any */ 34 35 36 int CancelTmStmp; 37 38 39 /* one struct for each transaction */ 40 struct ListNode { 41 int ItemNum, /* item number */ 42 TimeStamp, /* time at which the paper was written */ 43 CustomerID; /* customer identification number */ 44 struct ListNode *Link; /* link to next node */ 45 }; 46 47 48 typedef struct ListNode *NodePtrType; 49 50 51 NodePtrType ListHead, /* head of linked list */ 52 CurrPtr; /* pointer to the node containing the paper 53 now being processed */ 54 55 56 /* NodeInfoSize is the number of bytes in the information part of 57 the node, i.e. excluding the link */ 58 int NodeInfoSize = sizeof(struct ListNode) - sizeof(NodePtrType); 59 60 61 Display() 62 63 { NodePtrType TmpPtr = ListHead; 64 65 while (TmpPtr != 0) { 66 printf("%x %d %d %d %x\n",TmpPtr,TmpPtr->ItemNum, 67 TmpPtr->TimeStamp,TmpPtr->CustomerID,TmpPtr->Link); 68 TmpPtr = TmpPtr->Link; 69 } 70 } 71 72 73 BuildListFromFile() /* read input file and build linked list */ 74 75 { int FD, /* file descriptor */ 76 InFileSize, /* number of bytes in the input file */ 77 NNodes, /* number of nodes stored in the file */ 78 StartByte, /* index of the starting byte of the current 79 stored node in InFileArray */ 80 I,J; 81 char InFileName[20],*BytePtr; 82 NodePtrType TmpPtr,ListTail; 83 84 printf("enter file name\n"); 85 scanf("%s",InFileName); 86 FD = open(InFileName,O_RDONLY); 87 InFileSize = read(FD,InFileArray,MaxFileSize); 88 NNodes = InFileSize / NodeInfoSize; 89 90 /* now build the linked list, by copying from InFileArray, 91 one node at a time */ 92 for (I = 1; I <= NNodes; I++) { 93 94 /* create node in memory */ 95 TmpPtr = (NodePtrType) calloc(1,sizeof(struct ListNode)); 96 97 /* copy the I-th stored-node information to this new node */ 98 StartByte = (I-1)*NodeInfoSize; 99 for (J = 0; J < NodeInfoSize; J++) { 100 BytePtr = ((char *) TmpPtr) + J; 101 *BytePtr = InFileArray[StartByte+J]; 102 } 103 TmpPtr->Link = 0; 104 105 /* now append this new node to the linked list */ 106 if (I == 1) ListHead = ListTail = TmpPtr; 107 else { 108 /* attach to the list */ 109 ListTail->Link = TmpPtr; 110 /* at least for now, this will be the new tail */ 111 ListTail = TmpPtr; 112 } 113 } 114 } 115 116 117 GetCommand() 118 119 { printf("enter command\n"); 120 scanf("%c",&Command); 121 /* might be a leftover end-of-line character from the 122 previous action; if so, then read the next character 123 in the input stream */ 124 if (Command == '\n') scanf("%c",&Command); 125 } 126 127 128 GetData() /* prompt user for the information, and put it in 129 the struct pointed to by CurrPtr */ 130 131 { int ItmNm,TmStmp,CstmrID; 132 133 printf("enter item number, time stamp and customer i.d.\n"); 134 scanf("%d%d%d",&ItmNm,&TmStmp,&CstmrID); 135 136 /* create new node */ 137 CurrPtr = (NodePtrType) calloc(1,sizeof(struct ListNode)); 138 139 /* now fill it with the data */ 140 CurrPtr->ItemNum = ItmNm; 141 CurrPtr->TimeStamp = TmStmp; 142 CurrPtr->CustomerID = CstmrID; 143 144 /* this node is not connected to anything yet */ 145 CurrPtr->Link = 0; 146 } 147 148 149 Insert() /* inserts the struct pointed to by CurrPtr into the 150 list */ 151 152 { NodePtrType LeadingPtr,TrailingPtr; 153 154 /* list currently empty? */ 155 if (ListHead == 0) { 156 ListHead = CurrPtr; 157 return; 158 } 159 160 /* current time stamp earlier than list head? */ 161 if (CurrPtr->TimeStamp < ListHead->TimeStamp) { 162 CurrPtr->Link = ListHead; 163 ListHead = CurrPtr; 164 return; 165 } 166 167 LeadingPtr = ListHead->Link; TrailingPtr = ListHead; 168 while (1) { 169 170 /* reached end of list? */ 171 if (LeadingPtr == 0) { 172 TrailingPtr->Link = CurrPtr; 173 break; 174 } 175 176 /* arrived at insertion point? */ 177 if (CurrPtr->TimeStamp < LeadingPtr->TimeStamp) { 178 TrailingPtr->Link = CurrPtr; 179 CurrPtr->Link = LeadingPtr; 180 return; 181 } 182 /* otherwise move to next node */ 183 else { 184 TrailingPtr = LeadingPtr; 185 LeadingPtr = LeadingPtr->Link; 186 } 187 } 188 } 189 190 191 Cancel() /* delete the node whose time stamp matches CancelTmStmp; 192 to keep the program simple so as to ease the teaching 193 about pointers, it is assumed that there will be one, 194 and only one, node which matches the given time stamp */ 195 196 { NodePtrType LeadingPtr,TrailingPtr; 197 198 if (ListHead->TimeStamp == CancelTmStmp) { 199 ListHead = ListHead->Link; 200 return; 201 } 202 203 LeadingPtr = ListHead->Link; TrailingPtr = ListHead; 204 while (LeadingPtr->TimeStamp != CancelTmStmp) { 205 TrailingPtr = LeadingPtr; 206 LeadingPtr = LeadingPtr->Link; 207 } 208 TrailingPtr->Link = LeadingPtr->Link; 209 } 210 211 212 SaveToFile() 213 214 { int FD; /* file descriptor */ 215 NodePtrType TmpPtr; 216 char OutFileName[20]; 217 218 printf("enter output file name\n"); 219 scanf("%s",OutFileName); 220 FD = open(OutFileName,O_WRONLY|O_CREAT,0700); 221 222 TmpPtr = ListHead; 223 while (TmpPtr != 0) { 224 write(FD,TmpPtr,NodeInfoSize); 225 TmpPtr = TmpPtr->Link; 226 } 227 } 228 229 230 main() 231 232 { printf("read in a file?\n"); 233 scanf("%c",&Command); 234 if (Command == 'y') BuildListFromFile(); 235 else ListHead = 0; 236 while(1) { 237 GetCommand(); 238 switch (Command) { 239 case 'q': exit(); 240 case 'p': GetData(); Insert(); break; 241 case 'c': printf("enter time stamp\n"); 242 scanf("%d",&CancelTmStmp); 243 Cancel(); break; 244 case 'd': Display(); break; 245 case 'v': SaveToFile(); 246 } 247 } 248 } 249 250 heather% a.out 251 read in a file? 252 n 253 enter command 254 p 255 enter item number, time stamp and customer i.d. 256 12 150 222 257 enter command 258 p 259 enter item number, time stamp and customer i.d. 260 1 88 250 261 enter command 262 p 263 enter item number, time stamp and customer i.d. 264 86 152 1 265 enter command 266 d 267 66f8 1 88 250 66e0 268 66e0 12 150 222 6710 269 6710 86 152 1 0 270 enter command 271 c 272 enter time stamp 273 88 274 enter command 275 d 276 66e0 12 150 222 6710 277 6710 86 152 1 0 278 enter command 279 p 280 enter item number, time stamp and customer i.d. 281 100 4 16 282 enter command 283 d 284 6728 100 4 16 66e0 285 66e0 12 150 222 6710 286 6710 86 152 1 0 287 enter command 288 v 289 enter output file name 290 gy 291 enter command 292 q 293 heather% !! 294 a.out 295 read in a file? 296 y 297 enter file name 298 gy 299 enter command 300 d 301 66e0 100 4 16 66f8 302 66f8 12 150 222 6710 303 6710 86 152 1 0 304 enter command 305 c 306 enter time stamp 307 150 308 enter command 309 d 310 66e0 100 4 16 6710 311 6710 86 152 1 0 312 enter command 313 q 314 heather% e 315 heather% 316 script done on Mon Apr 20 23:14:48 1992
The comments (Lines 5-19) explain the goals of this program. Please read these carefully before continuing.
By the way, the comments mention that we hope to save disk space by storing the data in binary format. This will not necessarily be the case, depending on the size of the numbers involved. If a number is at least five digits long in decimal form, then binary form will save space here: An int variable on a SPARCstation takes up 32 bits, thus four bytes; in character form, i.e. when stored by a statement of the form
fprintf(file pointer,"%d",variable name);we would be using five bytes. On the other hand, if we knew that all our integer values would be in the range 0-255, we could use the type unsigned char instead of int in Lines 41-43, in which we would still save space by using binary form.
Overview: We can get a good idea of what the program is doing by looking at main(), Lines 230-248.
First, the user is asked whether to input existing information from a file, or to start from scratch. If the user does want to input information from a file, the function BuildListFromFile() will read all the records stored in the file, create structs for each one (with the struct type defined on Lines 39-45), and link the structs together, with the head of the chain being pointed to by ListHead. On the other hand, if we start with nothing in the list, ListHead will be initialized to 0.
Next, we get to the major portion of main(), the while loop. It continues to take commands from the user, either adding new records to the list, or deleting them when a cancellation request is made.
The user can also make other requests, e.g. that the current list be written to a file. In the latter case, the function SaveToFile() will carry out this request, traversing the chain, writing the data from each struct to the designated file. This data will consist of the three fields on Lines 41-43, but not that in Line 44, the Link value. The Link values are irrelevant, since when the file is read in again later on and the list recreated, the Link values will probably be different anyway.
Line 22: The file /usr/include/fcntl.h contains some #define's that will be needed for our I/O, e.g. a definition of the value O_RDONLY on Line 86.
Lines 39-45: Here is our definition of the struct type which we will use. Note carefully that ListNode is the name of the type, not the name of a variable of that type. This type consists of three integers, as well as a pointer variable. The latter is of type pointer-to-ListNode. In other words, a field within one ListNode struct will point to another ListNode struct--this is how we will link together many such structs.
Line 48: C's typedef construct does what its name implies--it defines a new type. In this case, we are defining a new type whose name is NodePtrType, and it is defined to be pointer-to-ListNode. Note that we had to put the word `struct' in here, to remind the compiler that ListNode was a struct type (Line 40); we could not have written simply
typedef ListNode *NodePtrType;omitting the `struct'. This is also true on Lines 58, 95, and so on.
Line 51: Now that we have defined this new type, we declare ListHead and CurrPtr to be of that type.
Line 58: Here is an example of the use of the sizeof operator. On a SPARCstation, integers and pointers are four bytes each, so NodeInfoSize will be equal to 12 here, but by using sizeof, this program will compile and run correctly on any machine/compiler combination.
Display() Function, Lines 61-70: Look at TmpPtr in Lines 63, 65 and 68: TmpPtr initially points to the head of the list (Line 63); at the end of each iteration of the loop TmpPtr is advanced to the next struct in the list (Line 68); and the loop continues until we get to the end of the list (Line 65). In this manner, we hit all the structs in the list, printing out each one (Lines 66-67).
As an aid to debugging, and also a teaching aid, I have also printed out for each struct its address and the address of the struct which it links to.
Input from file, Lines 86-87: We are using the Unix functions open() and read(). These, along with write() (Line 224), are the basic I/O functions for Unix, not for the C language. In fact, the C functions like fopen(), fscanf(), and so on, internally make calls to the corresponding Unix functions; this way a C program developed under a non-Unix environment can be moved to a Unix environment without change. These Unix I/O functions do straight binary data transfer--they just copy bytes from/to memory to/from a file, i.e. without any ``interpretation'' such as that done by %d.5
The open() function, Line 86, has as its first argument the file name. Its second argument states whether we are opening the file to read only, write only or whatever (you can get a complete list of these by typing man open). Instead of returning a file pointer, as fopen() does, open returns an integer-valued file descriptor, which we are storing in our int variable FD.
On Line 87 we are calling the read() function. Note that we are reading in the whole file at once! Here we have told read() to read the file whose file descriptor is FD (i.e. the file we just opened on Line 86), put whatever we read into the array (buffer) InFileArray, and to try to read MaxFileSize bytes. In most cases, the word ``try'' here is key, because there won't be MaxFileSize bytes in the file; then read will only read until the end of the file. In order to let us know just how many bytes it did read, the function will return this number, which we are storing in our variable InFileSize.
Lines 92-113: OK, now that we have read the file data into our array InFileArray, we can proceed to recreate our linked list from that array. That is what is happening within this loop, which has three major sections to it: In Line 95, we create a new struct in memory; in Lines 97-103, we fill this new struct with the data from the array; and in Lines 105-113, we attach this new struct to our list. More details on each of these below.
Line 95 Here we call the function calloc, which allocates memory for us. Its first argument states the number of objects we need room for, and the second parameter states how many bytes each object will need; here our ``object'' is a ListNode struct. Also, we wouldn't be able to use this newly allocated memory unless we knew where it was, so calloc tells us, by returning a pointer to it; we store that pointer value in TmpPtr.
The `(NodePtrType)' is called a cast, which essentially does a type change. The type of the return value of calloc is pointer-to-char, but we are saying to the compiler, ``Yes, Compiler, we know that calloc returns a value of type pointer-to-character, but we will use it as type NodePtrType, i.e. type pointer-to-ListNode, so don't worry; we know what we are doing.'' No actual change will be made--an address is an address, no matter what type of data is stored at that address--but this way we avoid the compiler warning us about type-mixing.
On Line 100, we see an example of pointer arithmetic. The program would fail if we were to replace this line with
BytePtr = (char *) (TmpPtr + J);Lines 149ff: See the next handout, ``A Closer Look at the Warehouse Program.''
1 Don't forget to terminate the definition with a semicolon! You will get very odd compiler error messages otherwise.
2 It would seem to make more sense to write Line 6 as
A T,U;
However, the C language requires that we include the ßtruct".
3 This is true in any language which has pointer variables, e.g. Pascal, but in C it is more explicit than in Pascal.
4 Note very carefully that we are NOT declaring variables named *Y and *Q. We are declaring variables named Y and Q. The `*' symbols are only there to let us know that Y and Q are pointers; the `*' is not part of the names of these variables.
5 As mentioned above, the C functions call the Unix functions, so some ``translation'' may have to be done along the way. For example, suppose we have an int variable M whose current value is 18, and we have the call fprintf(FPtr,"%d",M). Then the characters `1' and `8', i.e. we want the bit string 0011000000111000 to be written to the file. But M itself consists of the bit string 00000000000000000000000000010010, so the fprintf() function will need to convert from this latter string to the former one before calling write.