Homework 1

Due Monday, Oct. 8

Goals of this Assignment:

develop independent programming skills
get serious about using a debugging tool
make sure, once and for all, that you REALLY know pointers ("Bo knows pointers")
solidify your understanding of row-major vs. column-major array storage
review/acquire basic Linux skills

Here you will develop some routines and data structures to store and manipulate sparse matrices, meaning those whose elements include lots of 0s. If one is working with very large sparse matrices--matrices in the real world today can have rows and columns numbering in the thousands or even millions--we may wish to store only the nonzero elements, to save space.

Before going into the details, note the following:

In our textbook, it emphasizes that two-dimensional arrays are actually stored in one dimension; the two-dimensional view is just in our mind (with the compiler catering to that illusion). Note too what the book says about the duality between pointers and arrays. I recommend reading Sections 1.8.2.2 and 1.8.3 especially carefully.

It is not assumed here that you have already taken a linear algebra course. If not, there are short reviews available in the appendices of my open source textbooks in statistics and parallel programming. This information will NOT be used much in our course, but it will be needed at a few points. Thus THIS INFORMATION WILL BE ASSUMED, BOTH IN THIS ASSIGNMENT AND IN THE REST OF THE COURSE, INCLUDING TESTS.

Any compression scheme for sparse matrices consists of specifying information on "which" and "what":

which: Which elements of the matrix are nonzero?

what: What are the values of those elements?

The "which" part represents overhead, as ordinary matrix storage doesn't have it, but if the matrix is "sparse enough," we will in the end use less space.

Here we will restrict to matrices whose elements are only 0s and 1s, motivated by graph applications (the existence of an edge from vertex i to vertex j is coded as a 1 in row i, column j). Thus the "what" part above is not an issue.

The "which" part will be specified by stating where clumps of 1s are located. Roughly speaking, the storage structure will say something like, "There are two 1s starting at row 2, column 5, and six 1s starting and row 4, column 1, and ..."

It is assumed that the uncompressed form of a matrix is stored as a standard C/C++ two-dimensional array, i.e. row and column numbers corresponding to row-major ordering. This is true even if code has declared the array one-dimensional. Note that a clump of 1s may straddle two or more rows.

Again motivated by the graph context, we will assume that the number of rows and columns are equal, and that any matrix has at least one 1.

A compressed matrix will be stored as an object of a C++ class, with these specifications:

class smatrix {

   public:

      int nrc;  // number of rows and columns in the matrix
      int nclumps;  // number of clumps 
      // cmatrix contains the compressed form of the matrix; 
      // cmatrix[2*j] = r and cmatrix[2*j+1] = s means there is 
      // a clump of s 1s beginning at (1-dimen.) position r of the 
      // uncompressed matrix
      int *cmatrix;  // compressed form of the matrix
      int nclumps;  // number of clumps 

   // constructor inputs rc x rc uncompressed fullmatrix
   smatrix(int rc, char *fullmatrix);  
   smatrix operator+ (smatrix& summand2);  // boolean sum
   printm(void);  // prints uncompressed form of "this" to screen, row by row
};

The addition operator is for merging two graphs that have the same set of vertices. There will be a link from i to j in the merged graph if there is one in either or both of the summands. It is assumed that the two operands are of the same size (you do not have to check for this).

You may add other member variables and methods to the class if you wish, say to help your coding, but do not change anything above. For instance, do NOT add any arguments to the methods.

The purpose of the code is to save memory space. Thus, do not uncompress a matrix, even temporarily, during the addition or print operations.

Example:

int main()
{
   char m1[4] = {0,1,1,0};
   smatrix sm1(2,m1);
   char m2[9] = {0,1,1,0,0,1,1,1,0};
   smatrix sm2(3,m2);
   char m3[9] = {0,1,0,1,0,1,1,1,1};
   smatrix sm3(3,m3);
   sm2.printm();
   smatrix sm4 = sm2 + sm3;
   ...
}

After executing the above, you should have

sm1.cmatrix:
   nclumps = 1
   cmatrix = 1 2       
sm2.cmatrix:
   nclumps = 2
   cmatrix = 1 2 5 3       
sm3.cmatrix:
   nclumps = 3
   cmatrix = 1 1 3 1 5 4       
sm4.cmatrix:
   nclumps = 2
   cmatrix = 1 3 5 4

The call to printm() for sm2 should display

0 1 1
0 0 1
1 1 0

HIGHLY IMPORTANT NOTE:

The ability to use a debugging tool is crucial for software development. You should be using one for every program you write--for your OWN benefit. You can download Chapter 1 of my book on debugging for free here, to learn not only the mechanics of the tools but also the philosophy of effective debugging. (There is no need for you to buy the book.)

During the interactive grading session, the TA will ask you to demonstrate the operation of a debugging tool, e.g. setting a breakpoint or checking the value of a variable, on your code for this assignment. The tool is of your choice, but it must be one of the official ones on CSIF.