Introduction to the C Struct and Pointer Variables



1.  Technical Summary

A C struct type may be best introduced by comparison to array types. C array variables, as is the case with arrays in other languages, consist of a collection of variables of the same type. For example, in the declaration

int Z[5];
we are declaring a collection of five variables of type int. By contrast,

struct A {
   int L,M;
   char C;
} T,U;
would declare T and U to each consist of a collection of three variables; two of the three are integers, and the third is of type character.

It is important, though to describe this more precisely. What we are really doing above is two things:

We also can do (i) and (ii) separately, e.g. as

1  struct A {
2     int L,M;
3     char C;
4  };
5
6  struct A T,U;
(I have added line numbers for easy reference; of course, these would not be part of the actual .c file.) Here Lines 1-4 define this new type A,1 and Line 6 declares two variables, T and U, of this new type A.2 Note carefully that Lines 1-4 do not result in the compiler arranging for run-time space to be allocated in memory; only Line 6 does that. Put another way, the variables L, M and C don't exist until Line 6 (and then there will be two of each, one for T and one for U).

We can access the individual variables (called fields) within a struct by using the `.' operator. For example, T.M would be the second integer variable within T, and U.C would be the character variable within U.

A pointer variable consists of a memory address.3

Consider the declaration

int X,*Y,*Q;
Here X is declared to be of type int, but Y and Q are declared to be of type pointer-to-int, i.e. Y and Q will contain the memory addresses of int variables, maybe X.4 In fact, the latter would be the case for Y if we executed the statement

Y = &X;
The `*' operator is used to de-reference a pointer, i.e. to tell what is in the memory location pointed to by the pointer. In the above example, for instance, the statement

printf("%d\n",*Y);
would have exactly the same result as would

printf("%d\n",X);
If a pointer has the value 0, it is considered in C to point to nothing (even though most machines do have a memory location 0). If you attempt to de-reference that pointer, e.g. you have an expression `*Y' above at a time when Y is 0, you will generate an error message, which on Unix is usually ``segmentation fault.''

We can add and subtract values to/from pointers, but the arithmetic works in a different way. If P is a pointer-to-type-K, then the expression P+R is not necessarily R larger than P; it is R*sizeof(K) larger. In the example with Y above, suppose &X is 0x5000. Then (on 32-bit machines) Y will be 0x5000, and Y+3 will be 0x500c, not 0x5003.

One of the major uses for pointers is to reference structs. For instance, let us extend the above struct example to

struct A {
   int L,M;
   char C;
} T,U,*G;
and suppose we execute the statement

G = &T;
Then we could reference the fields within T indirectly via G, by using the - > operator. For example,

printf("%d\n",G->L);
would have exactly the same result as

printf("%d\n",T.L);

2.  Example

Below, in Lines 5-248, is an example program. It is quite involved, using structs and pointers in complex--but typical--manners. A script record of a couple of runs of the program follows the program listing, followed in turn by an analysis.

2.1  Script File Listing

     1  Script started on Mon Apr 20 23:12:16 1992
     2  heather% cat W*
     3  
     4  
     5  /* introductory program on structs and pointers; warehouse setting,
     6     the user inputs information from a pile of papers which document
     7     purchase orders and cancellation orders 
     8  
     9     each paper has a time stamp on it, showing the time the paper
    10     was written (in the case of a cancellation, the time stamp is
    11     the time at which the original order was place)
    12     
    13     the information must be maintained in time stamp sequence, in
    14     a linked list, which in this case would be preferable to an
    15     array since we don't have to "move everything over" when a
    16     cancellation occurs
    17     
    18     initial input can be from a previously-created file if desired; 
    19     file is in binary format, to save disk space */
    20  
    21  
    22  #include <fcntl.h>
    23  
    24  
    25  #define MaxFileSize 1000 
    26  
    27  
    28  char Command,  /* `q' (quit)
    29                    `p' (purchase)
    30                    `c' (cancel) 
    31                    `d' (display list)
    32                    `v' (save to file) */
    33       InFileArray[MaxFileSize];  /* buffer for input file, if any */
    34  
    35  
    36  int CancelTmStmp;
    37  
    38  
    39  /* one struct for each transaction */
    40  struct ListNode  {
    41     int ItemNum,  /* item number */
    42         TimeStamp,  /* time at which the paper was written */
    43         CustomerID;  /* customer identification number */
    44     struct ListNode *Link;  /* link to next node */
    45  };
    46  
    47  
    48  typedef struct ListNode *NodePtrType;
    49  
    50  
    51  NodePtrType ListHead,  /* head of linked list */
    52              CurrPtr;  /* pointer to the node containing the paper
    53                          now being processed */
    54  
    55  
    56  /* NodeInfoSize is the number of bytes in the information part of
    57     the node, i.e. excluding the link */
    58  int NodeInfoSize = sizeof(struct ListNode) - sizeof(NodePtrType);
    59  
    60  
    61  Display()  
    62  
    63  {  NodePtrType TmpPtr = ListHead;
    64  
    65     while (TmpPtr != 0)  {
    66        printf("%x %d %d %d %x\n",TmpPtr,TmpPtr->ItemNum,
    67           TmpPtr->TimeStamp,TmpPtr->CustomerID,TmpPtr->Link);
    68        TmpPtr = TmpPtr->Link;
    69     }
    70  }
    71  
    72  
    73  BuildListFromFile()  /* read input file and build linked list */
    74  
    75  {  int FD,  /* file descriptor */
    76         InFileSize,  /* number of bytes in the input file */
    77         NNodes,  /* number of nodes stored in the file */
    78         StartByte,  /* index of the starting byte of the current
    79                        stored node in InFileArray */
    80         I,J;
    81     char InFileName[20],*BytePtr;
    82     NodePtrType TmpPtr,ListTail;
    83  
    84     printf("enter file name\n");
    85     scanf("%s",InFileName);
    86     FD = open(InFileName,O_RDONLY);
    87     InFileSize = read(FD,InFileArray,MaxFileSize);
    88     NNodes = InFileSize / NodeInfoSize;
    89  
    90     /* now build the linked list, by copying from InFileArray,
    91        one node at a time */
    92     for (I = 1; I <= NNodes; I++)  {
    93  
    94        /* create node in memory */
    95        TmpPtr = (NodePtrType) calloc(1,sizeof(struct ListNode));  
    96  
    97        /* copy the I-th stored-node information to this new node */
    98        StartByte = (I-1)*NodeInfoSize;
    99        for (J = 0; J < NodeInfoSize; J++)  {
   100           BytePtr = ((char *) TmpPtr) + J;
   101           *BytePtr = InFileArray[StartByte+J];
   102        }
   103        TmpPtr->Link = 0; 
   104  
   105        /* now append this new node to the linked list */
   106        if (I == 1) ListHead = ListTail = TmpPtr;
   107        else  {
   108           /* attach to the list */
   109           ListTail->Link = TmpPtr;
   110           /* at least for now, this will be the new tail */
   111           ListTail = TmpPtr;
   112        }
   113     }
   114  }
   115  
   116  
   117  GetCommand()
   118  
   119  {  printf("enter command\n");  
   120     scanf("%c",&Command);
   121     /* might be a leftover end-of-line character from the 
   122        previous action; if so, then read the next character
   123        in the input stream */
   124     if (Command == '\n') scanf("%c",&Command);  
   125  }
   126  
   127  
   128  GetData()  /* prompt user for the information, and put it in
   129                the struct pointed to by CurrPtr */
   130  
   131  {  int ItmNm,TmStmp,CstmrID;  
   132  
   133     printf("enter item number, time stamp and customer i.d.\n");
   134     scanf("%d%d%d",&ItmNm,&TmStmp,&CstmrID);
   135  
   136     /* create new node */
   137     CurrPtr = (NodePtrType) calloc(1,sizeof(struct ListNode));  
   138  
   139     /* now fill it with the data */
   140     CurrPtr->ItemNum = ItmNm;
   141     CurrPtr->TimeStamp = TmStmp;
   142     CurrPtr->CustomerID = CstmrID;
   143  
   144     /* this node is not connected to anything yet */
   145     CurrPtr->Link = 0;  
   146  }
   147  
   148  
   149  Insert()  /* inserts the struct pointed to by CurrPtr into the
   150               list */
   151  
   152  {  NodePtrType LeadingPtr,TrailingPtr;
   153  
   154     /* list currently empty? */
   155     if (ListHead == 0)  {
   156        ListHead = CurrPtr;
   157        return;
   158     }
   159     
   160     /* current time stamp earlier than list head? */
   161     if (CurrPtr->TimeStamp < ListHead->TimeStamp)  {
   162        CurrPtr->Link = ListHead;
   163        ListHead = CurrPtr;
   164        return;
   165     }
   166     
   167     LeadingPtr = ListHead->Link;  TrailingPtr = ListHead;
   168     while (1)  {
   169  
   170        /* reached end of list? */
   171        if (LeadingPtr == 0)  {
   172           TrailingPtr->Link = CurrPtr;
   173           break;
   174        }
   175  
   176        /* arrived at insertion point? */
   177        if (CurrPtr->TimeStamp < LeadingPtr->TimeStamp)  {
   178           TrailingPtr->Link = CurrPtr;
   179           CurrPtr->Link = LeadingPtr;
   180           return;
   181        }
   182        /* otherwise move to next node */
   183        else  {
   184           TrailingPtr = LeadingPtr;
   185           LeadingPtr = LeadingPtr->Link;
   186        }
   187     }
   188  }
   189  
   190  
   191  Cancel()  /* delete the node whose time stamp matches CancelTmStmp;
   192               to keep the program simple so as to ease the teaching 
   193               about pointers, it is assumed that there will be one, 
   194               and only one, node which matches the given time stamp */
   195  
   196  {  NodePtrType LeadingPtr,TrailingPtr;
   197  
   198     if (ListHead->TimeStamp == CancelTmStmp)  {
   199        ListHead = ListHead->Link;
   200        return;
   201     }
   202  
   203     LeadingPtr = ListHead->Link;  TrailingPtr = ListHead;
   204     while (LeadingPtr->TimeStamp != CancelTmStmp)  {
   205        TrailingPtr = LeadingPtr;
   206        LeadingPtr = LeadingPtr->Link;
   207     }
   208     TrailingPtr->Link = LeadingPtr->Link;
   209  }
   210  
   211  
   212  SaveToFile()
   213  
   214  {  int FD;  /* file descriptor */
   215     NodePtrType TmpPtr;
   216     char OutFileName[20];
   217  
   218     printf("enter output file name\n");
   219     scanf("%s",OutFileName);
   220     FD = open(OutFileName,O_WRONLY|O_CREAT,0700);
   221  
   222     TmpPtr = ListHead;
   223     while (TmpPtr != 0)  {
   224        write(FD,TmpPtr,NodeInfoSize);
   225        TmpPtr = TmpPtr->Link;
   226     }
   227  }
   228  
   229  
   230  main()
   231  
   232  {  printf("read in a file?\n");
   233     scanf("%c",&Command);
   234     if (Command == 'y') BuildListFromFile();
   235     else ListHead = 0;
   236     while(1)  {
   237        GetCommand();
   238        switch (Command)  {
   239           case 'q':  exit();
   240           case 'p':  GetData(); Insert(); break;
   241           case 'c':  printf("enter time stamp\n");
   242                      scanf("%d",&CancelTmStmp);  
   243                      Cancel(); break;
   244           case 'd':  Display(); break;
   245           case 'v':  SaveToFile();
   246        }
   247     }
   248  }
   249  
   250  heather% a.out
   251  read in a file?
   252  n
   253  enter command
   254  p
   255  enter item number, time stamp and customer i.d.
   256  12 150 222
   257  enter command
   258  p
   259  enter item number, time stamp and customer i.d.
   260  1 88 250
   261  enter command
   262  p
   263  enter item number, time stamp and customer i.d.
   264  86 152 1
   265  enter command
   266  d
   267  66f8 1 88 250 66e0
   268  66e0 12 150 222 6710
   269  6710 86 152 1 0
   270  enter command
   271  c
   272  enter time stamp
   273  88
   274  enter command
   275  d
   276  66e0 12 150 222 6710
   277  6710 86 152 1 0
   278  enter command
   279  p
   280  enter item number, time stamp and customer i.d.
   281  100 4 16
   282  enter command
   283  d
   284  6728 100 4 16 66e0
   285  66e0 12 150 222 6710
   286  6710 86 152 1 0
   287  enter command
   288  v
   289  enter output file name
   290  gy
   291  enter command
   292  q
   293  heather% !!
   294  a.out
   295  read in a file?
   296  y
   297  enter file name
   298  gy
   299  enter command
   300  d
   301  66e0 100 4 16 66f8
   302  66f8 12 150 222 6710
   303  6710 86 152 1 0
   304  enter command
   305  c
   306  enter time stamp
   307  150
   308  enter command
   309  d
   310  66e0 100 4 16 6710
   311  6710 86 152 1 0
   312  enter command
   313  q
   314  heather% e
   315  heather% 
   316  script done on Mon Apr 20 23:14:48 1992

2.2  Analysis

The comments (Lines 5-19) explain the goals of this program. Please read these carefully before continuing.

By the way, the comments mention that we hope to save disk space by storing the data in binary format. This will not necessarily be the case, depending on the size of the numbers involved. If a number is at least five digits long in decimal form, then binary form will save space here: An int variable on a SPARCstation takes up 32 bits, thus four bytes; in character form, i.e. when stored by a statement of the form

fprintf(file pointer,"%d",variable name);
we would be using five bytes. On the other hand, if we knew that all our integer values would be in the range 0-255, we could use the type unsigned char instead of int in Lines 41-43, in which we would still save space by using binary form.

Overview: We can get a good idea of what the program is doing by looking at main(), Lines 230-248.

First, the user is asked whether to input existing information from a file, or to start from scratch. If the user does want to input information from a file, the function BuildListFromFile() will read all the records stored in the file, create structs for each one (with the struct type defined on Lines 39-45), and link the structs together, with the head of the chain being pointed to by ListHead. On the other hand, if we start with nothing in the list, ListHead will be initialized to 0.

Next, we get to the major portion of main(), the while loop. It continues to take commands from the user, either adding new records to the list, or deleting them when a cancellation request is made.

The user can also make other requests, e.g. that the current list be written to a file. In the latter case, the function SaveToFile() will carry out this request, traversing the chain, writing the data from each struct to the designated file. This data will consist of the three fields on Lines 41-43, but not that in Line 44, the Link value. The Link values are irrelevant, since when the file is read in again later on and the list recreated, the Link values will probably be different anyway.

Line 22: The file /usr/include/fcntl.h contains some #define's that will be needed for our I/O, e.g. a definition of the value O_RDONLY on Line 86.

Lines 39-45: Here is our definition of the struct type which we will use. Note carefully that ListNode is the name of the type, not the name of a variable of that type. This type consists of three integers, as well as a pointer variable. The latter is of type pointer-to-ListNode. In other words, a field within one ListNode struct will point to another ListNode struct--this is how we will link together many such structs.

Line 48: C's typedef construct does what its name implies--it defines a new type. In this case, we are defining a new type whose name is NodePtrType, and it is defined to be pointer-to-ListNode. Note that we had to put the word `struct' in here, to remind the compiler that ListNode was a struct type (Line 40); we could not have written simply

typedef ListNode *NodePtrType;
omitting the `struct'. This is also true on Lines 58, 95, and so on.

Line 51: Now that we have defined this new type, we declare ListHead and CurrPtr to be of that type.

Line 58: Here is an example of the use of the sizeof operator. On a SPARCstation, integers and pointers are four bytes each, so NodeInfoSize will be equal to 12 here, but by using sizeof, this program will compile and run correctly on any machine/compiler combination.

Display() Function, Lines 61-70: Look at TmpPtr in Lines 63, 65 and 68: TmpPtr initially points to the head of the list (Line 63); at the end of each iteration of the loop TmpPtr is advanced to the next struct in the list (Line 68); and the loop continues until we get to the end of the list (Line 65). In this manner, we hit all the structs in the list, printing out each one (Lines 66-67).

As an aid to debugging, and also a teaching aid, I have also printed out for each struct its address and the address of the struct which it links to.

Input from file, Lines 86-87: We are using the Unix functions open() and read(). These, along with write() (Line 224), are the basic I/O functions for Unix, not for the C language. In fact, the C functions like fopen(), fscanf(), and so on, internally make calls to the corresponding Unix functions; this way a C program developed under a non-Unix environment can be moved to a Unix environment without change. These Unix I/O functions do straight binary data transfer--they just copy bytes from/to memory to/from a file, i.e. without any ``interpretation'' such as that done by %d.5

The open() function, Line 86, has as its first argument the file name. Its second argument states whether we are opening the file to read only, write only or whatever (you can get a complete list of these by typing man open). Instead of returning a file pointer, as fopen() does, open returns an integer-valued file descriptor, which we are storing in our int variable FD.

On Line 87 we are calling the read() function. Note that we are reading in the whole file at once! Here we have told read() to read the file whose file descriptor is FD (i.e. the file we just opened on Line 86), put whatever we read into the array (buffer) InFileArray, and to try to read MaxFileSize bytes. In most cases, the word ``try'' here is key, because there won't be MaxFileSize bytes in the file; then read will only read until the end of the file. In order to let us know just how many bytes it did read, the function will return this number, which we are storing in our variable InFileSize.

Lines 92-113: OK, now that we have read the file data into our array InFileArray, we can proceed to recreate our linked list from that array. That is what is happening within this loop, which has three major sections to it: In Line 95, we create a new struct in memory; in Lines 97-103, we fill this new struct with the data from the array; and in Lines 105-113, we attach this new struct to our list. More details on each of these below.

Line 95 Here we call the function calloc, which allocates memory for us. Its first argument states the number of objects we need room for, and the second parameter states how many bytes each object will need; here our ``object'' is a ListNode struct. Also, we wouldn't be able to use this newly allocated memory unless we knew where it was, so calloc tells us, by returning a pointer to it; we store that pointer value in TmpPtr.

The `(NodePtrType)' is called a cast, which essentially does a type change. The type of the return value of calloc is pointer-to-char, but we are saying to the compiler, ``Yes, Compiler, we know that calloc returns a value of type pointer-to-character, but we will use it as type NodePtrType, i.e. type pointer-to-ListNode, so don't worry; we know what we are doing.'' No actual change will be made--an address is an address, no matter what type of data is stored at that address--but this way we avoid the compiler warning us about type-mixing.

On Line 100, we see an example of pointer arithmetic. The program would fail if we were to replace this line with

BytePtr = (char *) (TmpPtr + J);
Lines 149ff: See the next handout, ``A Closer Look at the Warehouse Program.''


Footnotes:

1 Don't forget to terminate the definition with a semicolon! You will get very odd compiler error messages otherwise.

2 It would seem to make more sense to write Line 6 as

A T,U;

However, the C language requires that we include the ßtruct".

3 This is true in any language which has pointer variables, e.g. Pascal, but in C it is more explicit than in Pascal.

4 Note very carefully that we are NOT declaring variables named *Y and *Q. We are declaring variables named Y and Q. The `*' symbols are only there to let us know that Y and Q are pointers; the `*' is not part of the names of these variables.

5 As mentioned above, the C functions call the Unix functions, so some ``translation'' may have to be done along the way. For example, suppose we have an int variable M whose current value is 18, and we have the call fprintf(FPtr,"%d",M). Then the characters `1' and `8', i.e. we want the bit string 0011000000111000 to be written to the file. But M itself consists of the bit string 00000000000000000000000000010010, so the fprintf() function will need to convert from this latter string to the former one before calling write.


File translated from TEX by TTH, version 1.1.