Introduction to the C Struct and Pointer Variables

## 1.  Technical Summary

A C struct type may be best introduced by comparison to array types. C array variables, as is the case with arrays in other languages, consist of a collection of variables of the same type. For example, in the declaration

```int Z[5];
```
we are declaring a collection of five variables of type int. By contrast,

```struct A {
int L,M;
char C;
} T,U;
```
would declare T and U to each consist of a collection of three variables; two of the three are integers, and the third is of type character.

It is important, though to describe this more precisely. What we are really doing above is two things:

• [(i)] We are inventing a new type, whose name will be A.

• [(ii)] We are declaring two variables, T and U, of this new type A.

We also can do (i) and (ii) separately, e.g. as

```1  struct A {
2     int L,M;
3     char C;
4  };
5
6  struct A T,U;
```
(I have added line numbers for easy reference; of course, these would not be part of the actual .c file.) Here Lines 1-4 define this new type A,1 and Line 6 declares two variables, T and U, of this new type A.2 Note carefully that Lines 1-4 do not result in the compiler arranging for run-time space to be allocated in memory; only Line 6 does that. Put another way, the variables L, M and C don't exist until Line 6 (and then there will be two of each, one for T and one for U).

We can access the individual variables (called fields) within a struct by using the `.' operator. For example, T.M would be the second integer variable within T, and U.C would be the character variable within U.

A pointer variable consists of a memory address.3

Consider the declaration

```int X,*Y,*Q;
```
Here X is declared to be of type int, but Y and Q are declared to be of type pointer-to-int, i.e. Y and Q will contain the memory addresses of int variables, maybe X.4 In fact, the latter would be the case for Y if we executed the statement

```Y = &X;
```
The `*' operator is used to de-reference a pointer, i.e. to tell what is in the memory location pointed to by the pointer. In the above example, for instance, the statement

```printf("%d\n",*Y);
```
would have exactly the same result as would

```printf("%d\n",X);
```
If a pointer has the value 0, it is considered in C to point to nothing (even though most machines do have a memory location 0). If you attempt to de-reference that pointer, e.g. you have an expression `*Y' above at a time when Y is 0, you will generate an error message, which on Unix is usually ``segmentation fault.''

We can add and subtract values to/from pointers, but the arithmetic works in a different way. If P is a pointer-to-type-K, then the expression P+R is not necessarily R larger than P; it is R*sizeof(K) larger. In the example with Y above, suppose &X is 0x5000. Then (on 32-bit machines) Y will be 0x5000, and Y+3 will be 0x500c, not 0x5003.

One of the major uses for pointers is to reference structs. For instance, let us extend the above struct example to

```struct A {
int L,M;
char C;
} T,U,*G;
```
and suppose we execute the statement

```G = &T;
```
Then we could reference the fields within T indirectly via G, by using the - > operator. For example,

```printf("%d\n",G->L);
```
would have exactly the same result as

```printf("%d\n",T.L);
```

## 2.  Example

Below, in Lines 5-248, is an example program. It is quite involved, using structs and pointers in complex--but typical--manners. A script record of a couple of runs of the program follows the program listing, followed in turn by an analysis.

### 2.1  Script File Listing

```     1  Script started on Mon Apr 20 23:12:16 1992
2  heather% cat W*
3
4
5  /* introductory program on structs and pointers; warehouse setting,
6     the user inputs information from a pile of papers which document
7     purchase orders and cancellation orders
8
9     each paper has a time stamp on it, showing the time the paper
10     was written (in the case of a cancellation, the time stamp is
11     the time at which the original order was place)
12
13     the information must be maintained in time stamp sequence, in
14     a linked list, which in this case would be preferable to an
15     array since we don't have to "move everything over" when a
16     cancellation occurs
17
18     initial input can be from a previously-created file if desired;
19     file is in binary format, to save disk space */
20
21
22  #include <fcntl.h>
23
24
25  #define MaxFileSize 1000
26
27
28  char Command,  /* `q' (quit)
29                    `p' (purchase)
30                    `c' (cancel)
31                    `d' (display list)
32                    `v' (save to file) */
33       InFileArray[MaxFileSize];  /* buffer for input file, if any */
34
35
36  int CancelTmStmp;
37
38
39  /* one struct for each transaction */
40  struct ListNode  {
41     int ItemNum,  /* item number */
42         TimeStamp,  /* time at which the paper was written */
43         CustomerID;  /* customer identification number */
45  };
46
47
48  typedef struct ListNode *NodePtrType;
49
50
52              CurrPtr;  /* pointer to the node containing the paper
53                          now being processed */
54
55
56  /* NodeInfoSize is the number of bytes in the information part of
57     the node, i.e. excluding the link */
58  int NodeInfoSize = sizeof(struct ListNode) - sizeof(NodePtrType);
59
60
61  Display()
62
63  {  NodePtrType TmpPtr = ListHead;
64
65     while (TmpPtr != 0)  {
66        printf("%x %d %d %d %x\n",TmpPtr,TmpPtr->ItemNum,
69     }
70  }
71
72
74
75  {  int FD,  /* file descriptor */
76         InFileSize,  /* number of bytes in the input file */
77         NNodes,  /* number of nodes stored in the file */
78         StartByte,  /* index of the starting byte of the current
79                        stored node in InFileArray */
80         I,J;
81     char InFileName[20],*BytePtr;
82     NodePtrType TmpPtr,ListTail;
83
84     printf("enter file name\n");
85     scanf("%s",InFileName);
86     FD = open(InFileName,O_RDONLY);
88     NNodes = InFileSize / NodeInfoSize;
89
90     /* now build the linked list, by copying from InFileArray,
91        one node at a time */
92     for (I = 1; I <= NNodes; I++)  {
93
94        /* create node in memory */
95        TmpPtr = (NodePtrType) calloc(1,sizeof(struct ListNode));
96
97        /* copy the I-th stored-node information to this new node */
98        StartByte = (I-1)*NodeInfoSize;
99        for (J = 0; J < NodeInfoSize; J++)  {
100           BytePtr = ((char *) TmpPtr) + J;
101           *BytePtr = InFileArray[StartByte+J];
102        }
104
105        /* now append this new node to the linked list */
106        if (I == 1) ListHead = ListTail = TmpPtr;
107        else  {
108           /* attach to the list */
110           /* at least for now, this will be the new tail */
111           ListTail = TmpPtr;
112        }
113     }
114  }
115
116
117  GetCommand()
118
119  {  printf("enter command\n");
120     scanf("%c",&Command);
121     /* might be a leftover end-of-line character from the
122        previous action; if so, then read the next character
123        in the input stream */
124     if (Command == '\n') scanf("%c",&Command);
125  }
126
127
128  GetData()  /* prompt user for the information, and put it in
129                the struct pointed to by CurrPtr */
130
131  {  int ItmNm,TmStmp,CstmrID;
132
133     printf("enter item number, time stamp and customer i.d.\n");
134     scanf("%d%d%d",&ItmNm,&TmStmp,&CstmrID);
135
136     /* create new node */
137     CurrPtr = (NodePtrType) calloc(1,sizeof(struct ListNode));
138
139     /* now fill it with the data */
140     CurrPtr->ItemNum = ItmNm;
141     CurrPtr->TimeStamp = TmStmp;
142     CurrPtr->CustomerID = CstmrID;
143
144     /* this node is not connected to anything yet */
146  }
147
148
149  Insert()  /* inserts the struct pointed to by CurrPtr into the
150               list */
151
153
154     /* list currently empty? */
155     if (ListHead == 0)  {
157        return;
158     }
159
160     /* current time stamp earlier than list head? */
161     if (CurrPtr->TimeStamp < ListHead->TimeStamp)  {
164        return;
165     }
166
168     while (1)  {
169
170        /* reached end of list? */
171        if (LeadingPtr == 0)  {
173           break;
174        }
175
176        /* arrived at insertion point? */
177        if (CurrPtr->TimeStamp < LeadingPtr->TimeStamp)  {
180           return;
181        }
182        /* otherwise move to next node */
183        else  {
186        }
187     }
188  }
189
190
191  Cancel()  /* delete the node whose time stamp matches CancelTmStmp;
192               to keep the program simple so as to ease the teaching
193               about pointers, it is assumed that there will be one,
194               and only one, node which matches the given time stamp */
195
197
198     if (ListHead->TimeStamp == CancelTmStmp)  {
200        return;
201     }
202
204     while (LeadingPtr->TimeStamp != CancelTmStmp)  {
207     }
209  }
210
211
212  SaveToFile()
213
214  {  int FD;  /* file descriptor */
215     NodePtrType TmpPtr;
216     char OutFileName[20];
217
218     printf("enter output file name\n");
219     scanf("%s",OutFileName);
220     FD = open(OutFileName,O_WRONLY|O_CREAT,0700);
221
223     while (TmpPtr != 0)  {
224        write(FD,TmpPtr,NodeInfoSize);
226     }
227  }
228
229
230  main()
231
232  {  printf("read in a file?\n");
233     scanf("%c",&Command);
234     if (Command == 'y') BuildListFromFile();
236     while(1)  {
237        GetCommand();
238        switch (Command)  {
239           case 'q':  exit();
240           case 'p':  GetData(); Insert(); break;
241           case 'c':  printf("enter time stamp\n");
242                      scanf("%d",&CancelTmStmp);
243                      Cancel(); break;
244           case 'd':  Display(); break;
245           case 'v':  SaveToFile();
246        }
247     }
248  }
249
250  heather% a.out
252  n
253  enter command
254  p
255  enter item number, time stamp and customer i.d.
256  12 150 222
257  enter command
258  p
259  enter item number, time stamp and customer i.d.
260  1 88 250
261  enter command
262  p
263  enter item number, time stamp and customer i.d.
264  86 152 1
265  enter command
266  d
267  66f8 1 88 250 66e0
268  66e0 12 150 222 6710
269  6710 86 152 1 0
270  enter command
271  c
272  enter time stamp
273  88
274  enter command
275  d
276  66e0 12 150 222 6710
277  6710 86 152 1 0
278  enter command
279  p
280  enter item number, time stamp and customer i.d.
281  100 4 16
282  enter command
283  d
284  6728 100 4 16 66e0
285  66e0 12 150 222 6710
286  6710 86 152 1 0
287  enter command
288  v
289  enter output file name
290  gy
291  enter command
292  q
293  heather% !!
294  a.out
296  y
297  enter file name
298  gy
299  enter command
300  d
301  66e0 100 4 16 66f8
302  66f8 12 150 222 6710
303  6710 86 152 1 0
304  enter command
305  c
306  enter time stamp
307  150
308  enter command
309  d
310  66e0 100 4 16 6710
311  6710 86 152 1 0
312  enter command
313  q
314  heather% e
315  heather%
316  script done on Mon Apr 20 23:14:48 1992
```

### 2.2  Analysis

By the way, the comments mention that we hope to save disk space by storing the data in binary format. This will not necessarily be the case, depending on the size of the numbers involved. If a number is at least five digits long in decimal form, then binary form will save space here: An int variable on a SPARCstation takes up 32 bits, thus four bytes; in character form, i.e. when stored by a statement of the form

```fprintf(file pointer,"%d",variable name);
```
we would be using five bytes. On the other hand, if we knew that all our integer values would be in the range 0-255, we could use the type unsigned char instead of int in Lines 41-43, in which we would still save space by using binary form.

Overview: We can get a good idea of what the program is doing by looking at main(), Lines 230-248.

First, the user is asked whether to input existing information from a file, or to start from scratch. If the user does want to input information from a file, the function BuildListFromFile() will read all the records stored in the file, create structs for each one (with the struct type defined on Lines 39-45), and link the structs together, with the head of the chain being pointed to by ListHead. On the other hand, if we start with nothing in the list, ListHead will be initialized to 0.

Next, we get to the major portion of main(), the while loop. It continues to take commands from the user, either adding new records to the list, or deleting them when a cancellation request is made.

The user can also make other requests, e.g. that the current list be written to a file. In the latter case, the function SaveToFile() will carry out this request, traversing the chain, writing the data from each struct to the designated file. This data will consist of the three fields on Lines 41-43, but not that in Line 44, the Link value. The Link values are irrelevant, since when the file is read in again later on and the list recreated, the Link values will probably be different anyway.

Line 22: The file /usr/include/fcntl.h contains some #define's that will be needed for our I/O, e.g. a definition of the value O_RDONLY on Line 86.

Lines 39-45: Here is our definition of the struct type which we will use. Note carefully that ListNode is the name of the type, not the name of a variable of that type. This type consists of three integers, as well as a pointer variable. The latter is of type pointer-to-ListNode. In other words, a field within one ListNode struct will point to another ListNode struct--this is how we will link together many such structs.

Line 48: C's typedef construct does what its name implies--it defines a new type. In this case, we are defining a new type whose name is NodePtrType, and it is defined to be pointer-to-ListNode. Note that we had to put the word `struct' in here, to remind the compiler that ListNode was a struct type (Line 40); we could not have written simply

```typedef ListNode *NodePtrType;
```
omitting the `struct'. This is also true on Lines 58, 95, and so on.

Line 51: Now that we have defined this new type, we declare ListHead and CurrPtr to be of that type.

Line 58: Here is an example of the use of the sizeof operator. On a SPARCstation, integers and pointers are four bytes each, so NodeInfoSize will be equal to 12 here, but by using sizeof, this program will compile and run correctly on any machine/compiler combination.

Display() Function, Lines 61-70: Look at TmpPtr in Lines 63, 65 and 68: TmpPtr initially points to the head of the list (Line 63); at the end of each iteration of the loop TmpPtr is advanced to the next struct in the list (Line 68); and the loop continues until we get to the end of the list (Line 65). In this manner, we hit all the structs in the list, printing out each one (Lines 66-67).

As an aid to debugging, and also a teaching aid, I have also printed out for each struct its address and the address of the struct which it links to.

Input from file, Lines 86-87: We are using the Unix functions open() and read(). These, along with write() (Line 224), are the basic I/O functions for Unix, not for the C language. In fact, the C functions like fopen(), fscanf(), and so on, internally make calls to the corresponding Unix functions; this way a C program developed under a non-Unix environment can be moved to a Unix environment without change. These Unix I/O functions do straight binary data transfer--they just copy bytes from/to memory to/from a file, i.e. without any ``interpretation'' such as that done by %d.5

The open() function, Line 86, has as its first argument the file name. Its second argument states whether we are opening the file to read only, write only or whatever (you can get a complete list of these by typing man open). Instead of returning a file pointer, as fopen() does, open returns an integer-valued file descriptor, which we are storing in our int variable FD.

On Line 87 we are calling the read() function. Note that we are reading in the whole file at once! Here we have told read() to read the file whose file descriptor is FD (i.e. the file we just opened on Line 86), put whatever we read into the array (buffer) InFileArray, and to try to read MaxFileSize bytes. In most cases, the word ``try'' here is key, because there won't be MaxFileSize bytes in the file; then read will only read until the end of the file. In order to let us know just how many bytes it did read, the function will return this number, which we are storing in our variable InFileSize.

Lines 92-113: OK, now that we have read the file data into our array InFileArray, we can proceed to recreate our linked list from that array. That is what is happening within this loop, which has three major sections to it: In Line 95, we create a new struct in memory; in Lines 97-103, we fill this new struct with the data from the array; and in Lines 105-113, we attach this new struct to our list. More details on each of these below.

Line 95 Here we call the function calloc, which allocates memory for us. Its first argument states the number of objects we need room for, and the second parameter states how many bytes each object will need; here our ``object'' is a ListNode struct. Also, we wouldn't be able to use this newly allocated memory unless we knew where it was, so calloc tells us, by returning a pointer to it; we store that pointer value in TmpPtr.

The `(NodePtrType)' is called a cast, which essentially does a type change. The type of the return value of calloc is pointer-to-char, but we are saying to the compiler, ``Yes, Compiler, we know that calloc returns a value of type pointer-to-character, but we will use it as type NodePtrType, i.e. type pointer-to-ListNode, so don't worry; we know what we are doing.'' No actual change will be made--an address is an address, no matter what type of data is stored at that address--but this way we avoid the compiler warning us about type-mixing.

On Line 100, we see an example of pointer arithmetic. The program would fail if we were to replace this line with

```BytePtr = (char *) (TmpPtr + J);
```
Lines 149ff: See the next handout, ``A Closer Look at the Warehouse Program.''

### Footnotes:

1 Don't forget to terminate the definition with a semicolon! You will get very odd compiler error messages otherwise.

2 It would seem to make more sense to write Line 6 as

A T,U;

However, the C language requires that we include the ßtruct".

3 This is true in any language which has pointer variables, e.g. Pascal, but in C it is more explicit than in Pascal.

4 Note very carefully that we are NOT declaring variables named *Y and *Q. We are declaring variables named Y and Q. The `*' symbols are only there to let us know that Y and Q are pointers; the `*' is not part of the names of these variables.

5 As mentioned above, the C functions call the Unix functions, so some ``translation'' may have to be done along the way. For example, suppose we have an int variable M whose current value is 18, and we have the call fprintf(FPtr,"%d",M). Then the characters `1' and `8', i.e. we want the bit string 0011000000111000 to be written to the file. But M itself consists of the bit string 00000000000000000000000000010010, so the fprintf() function will need to convert from this latter string to the former one before calling write.

File translated from TEX by TTH, version 1.1.