MulSim Multiprocessor Simulator

Norman Matloff and Kevin Rich
Department of Computer Science
University of California at Davis
correspondence author:
Norman Matloff
matloff@cs.ucdavis.edu ¹

Updated February 27, 2000

1 Introduction
    1.1 Overview
    1.2 Introductory Example
2 Installation
    2.1 Building the Simulator and Assembler
        2.1.1 Building
        2.1.2 Testing
        2.1.3 Done With Simulator and Assembler
    2.2 Building the Compiler
        2.2.1 Building
        2.2.2 Testing
    2.3 Cleanup
    2.4 Internal Operation
3 Tutorial and Quick Reference
    3.1 Description of Our Sample Program
    3.2 Preparation of Our Sample Program
    3.3 Compiling Our Sample Program
    3.4 Running Our Sample Program
4 MulSim Processor Architecture
    4.1 Register File
    4.2 Instruction Set
5 Mas Assembly Language and Assembler (Optional)
    5.1 File Structures
    5.2 Examples
    5.3 Invoking the Assembler
6 Operation and Usage of the MulSim Simulator
    6.1 Operation
    6.2 Usage
7 MulSim C Compiler
    7.1 Usage
    7.2 MulSim-Specific Additions
        7.2.1 Interprocessor Synchronization
        7.2.2 System Values
        7.2.3 Command-Line Arguments
        7.2.4 Printing to stdout
    7.3 Restrictions and Bugs
        7.3.1 Data Types
        7.3.2 Initializing Variables
        7.3.3 Stack Size and Local Arrays
        7.3.4 No Separate Compilation
        7.3.5 No Library Functions or System Calls
        7.3.6 Compile Script Compiles Anyway After Announcing Error
        7.3.7 "Assertion" Error Message
    7.4 Examples
    7.5 Packages
8 Built-In and User-Defined Memory-Access Models
    8.1 Built-In Models
    8.2 User-Defined Interconnects
9 Writing the User Hooks (Optional)
10 Debugging
    10.1 Using the Built-In Debugging Tool
        10.1.1 Correlating C Source Code with the Assembly Level
        10.1.2 Debugging-Tool Commands
        10.1.3 What About a Segmentation Fault?
    10.2 Using print_int() and print_str()
Appendix A MulSim Instruction Set

1. Introduction

1.1 Overview

MulSim is a simulator for a shared-memory multiprocessor, written by Norman Matloff with a compiler by Kevin Rich. Its advantage is its simplicity and platform independence, compared to more sophisticated simulators such as rsim, PROTEUS, Tango and limes, which are platform-restricted and much more complicated to use. Another advantage of MulSim is that it includes a built-in debugging tool (though not fully symbolic).

MulSim's processor architecture, which is defined by the Mas assembly language described in this document, is similar to that of the Sun Microsystems SPARC processors: instructions execute in a single cycle, it is a load/store machine, with all arithmetic operations being done in register-to-register mode only, and it has a system of register windows used to store the runtime stack.

The user may choose from several types of processor/memory interconnects, or may develop his/her own type of interconnect; MulSim has been designed in a modular fashion which facilitates the user taking the latter approach. There are also optional features which allow the user to do statistics-gathering, etc.

The original version of MulSim was developed in 1993 by the first author. It consisted of an assembler and a simulator/debugger. The debugging facilty can also be used for close examination of program behavior, say to help identify how memory contention problems arise. In 1996, the second author added an ANSI C compiler, based on the lcc retargetable compiler package developed at Chris Fraser and David Hanson at Princeton University. Bug fixes to the compiler have been handled by the first author since that time.

1.2 Introductory Example

As an example of the kinds of analyses which can be performed by MulSim, consider first the program in the file Examples/KWQSort.mas. In this program there are `na' independent arrays to be sorted. Whenever a processor becomes idle, it is assigned a batch of k new arrays to sort. The parameter k is a design factor. If k is large, much processor time will be wasted at the end of a run-one processor may have several arrays left to sort in its latest batch of k, while the other processors are idle, unable to help the other processor by working on the arrays it hasn't gotten to yet. On the other hand, if k is small, the parallelism will be too fine-grained, and contention for the batch-allocation variables will hamper performance. (See Kruskal and Weiss, IEEE Transactions on Software Engineering, 1985, and Hummel et al, Communications of the Association for Computing Machinery, 1992.)

After this user wrote this program, he assembled it by running the program Mas, and then he ran it, with the command line

KWQSort 16 n b 100 8 1

Here the user has specified 16 processors and a noninteractive session, with the processor/memory interconnect being a bus. These are general MulSim command-line parameters, while the last three are specific to the application program KWQSort.mas; the latter parameters specify that na = 100, each array has 8 elements, and k = 1.

The output of this run was

all CPUs halted
5962 cycles were executed

Then the user tried k = 2 and k = 3:

KWQSort 16 n b 100 8 2
all CPUs halted
6197 cycles were executed

KWQSort 16 n b 100 8 3
all CPUs halted
6455 cycles were executed

Next the user wrote an alternate version of the program, in which the row-assignment variables were accessed by Mas' tas instruction, which does a test-and-set operation. The original version used Mas' other process-coordination instruction, ainc, which does an atomic-increment operation (``fetch-and-add'') on a specified memory location. The version using tas has greater overhead, as the results showed, such as for k = 1:

KWTas 16 n b 100 8 1
all CPUs halted
6972 cycles were executed

The 6972 cycles needed here were substantially greater than the 5962 cycles we had in the ainc version. So it does indeed appear that including ainc in the processor architecture is worthwhile.

But what is possibly even more interesting is the effect of varying k:

KWTas 16 n b 100 8 2
all CPUs halted
6760 cycles were executed

KWTas 16 n b 100 8 3
all CPUs halted
6841 cycles were executed

KWTas 16 n b 100 8 4
all CPUs halted
6390 cycles were executed

KWTas 16 n b 100 8 5
all CPUs halted
7047 cycles were executed

Whereas for the ainc version of the program the optimal batch size was k = 1, here it was k = 4. The amount of overhead here was sufficient to make it worthwhile to do work in batches, even though the price is paid in processor idle time at the end of the run.

2. Installation

We assume that your host environment has the following characteristics:

[(a)] The operating system is some version of Unix.
[(b)] Under your C compiler, sizeof(int) and sizeof(float) are each 4.
[(c)] You are running the C shell csh (or tcsh.

Item (a) is virtually mandatory. If your system does not satisfy (b) or (c), the necessary modifications are not difficult if you have a moderately good knowledge of Unix system programming.

IMPORTANT NOTE: When you unpack the MulSim distribution, be sure to use the `p' option of tar, e.g.

gunzip -c MSDistrib.tar.gz | tar xpf -

We will refer to the top-level directory of the MulSim package---the directory containing the README file and having subdirectories such as Compiler, Bin and so on-as the MulSim root directory.. Go to this directory; if you obtained the package as a file named MSDistrib.tar.gz and you have just unzipped and untarred the package, then the command for this will be

cd Distrib

Then type

set mulsimroot = `pwd`

Also, add a similar command to your .cshrc file, say by typing

echo "set mulsimroot = $mulsimroot" >> ~/.cshrc

2.1 Building the Simulator and Assembler

2.1.1 Building

Enter the MulSimSrc directory, and type

source MulSimBuild.csh

(If you have a problem, try using gcc instead of cc.) Add the Bin directory to your search path, both with set for the current session and in your .cshrc file, with a command like

set path = ( $path $mulsimroot/Bin )

2.1.2 Testing

You can test your installation by entering the Examples directory. Type

make

and then run the programs, by typing

Sum 16 n b
Sum 16 n su
Sum 16 n si16 4
Sum 128 n b
KWQSort 16 i b 50 6 4
KWQSort 16 i su 50 6 4
KWQSort 16 i si16 50 6 4 8

and comparing the outputs with the file Tests. (You will need to input commands for the last three tests; see the Tests file.)

2.1.3 Done With Simulator and Assembler

If you wish to work only at the assembly level, your work is now done, and you may skip the next section and Section 7.

2.2 Building the Compiler

2.2.1 Building

The MulSim compiler, mcc, is based on the lcc retargetable compiler package developed by Chris Fraser and David Hanson at Princeton University.

footnote: See A Retargetable C Compiler: Design and Implementation, by C. Fraser and D. Hanson, Addison-Wesley, 1995. The lcc system has a World Wide Web page at
http://www.cs.princeton.edu/software/lcc/
(Note, though, that our copy of the distribution here differs, in that we have added some files, and have changed src/lcc/src/bind.c by removing its ``#line'' directive at the top of the file.)

Actually, mcc is a driver for lcc, which in turn is a driver for the actual compiler, rcc.

To get started, type

setenv MCC_PATH $mulsimroot/Compiler

Add this same setenv command to your .cshrc file. Then type

cd $MCC_PATH
perl install.prl `which perl`
cd src/lcc/etc

At this point, look for a file with prefix ``Mcc.'' for a machine/operating system similar to yours, such as Mcc.Linux.c for a Linux system. If you cannot find a file corresponding to your system, try Mcc.Generic.c, which will probably work as long as you make sure that it correctly specifies the location of cpp, the C-language preprocessor, as follows:

This typically will be /lib/cpp. If it is not there, use the cpp which is associated with gcc. First type

which gcc

to determine where gcc is, typically in /usr/local/gnu/bin; let's call this directory GNU/bin. Then cpp will probably be in GNU/lib, or one of its subdirectories (you may have to descend deep into that tree).

Then copy the Mcc.* file you choose above to tmp.c, and type

source Mcc.MulSim.c.csh

which will form MulSim.c.

Then type

source Mcc.Make

to build and install lcc. (If you get an error message about conflicting types concerning "basename" in lcc.c, then comment-out the prototype (line 39) and the macro (lines 191-205) and then try again.)

Now, to build and install rcc, type

cd ../mulsim/no_op_sys
source Mcc.Rcc
cd $MCC_PATH
chmod go+rx bin examples include lib
chmod go+rx ../Compiler

Make sure there are no error messages! (Warnings are OK.) If you get a message that the file stab.h is missing, then obtain stab.h and stab.def, change the line

#include <stab.h>

#include "stab.h"

in $MCC_PATH/src/lcc/src/sparc.c, and then try again, by typing:

source Mcc.Rcc

from the directory containing that Mcc.Rcc file, $MCC_PATH/src/lcc/mulsim/no_op_sys.

Finally, add the $MCC_PATH/bin directory to your search path, both with set for the current session and in your .cshrc file.

2.2.2 Testing

There are several example programs in the directory $MCC_PATH/examples. To test the matrix-multiply example, type

mcc mat_mult.c
mat_mult 4 n b

The outputted array Z should be 20x20, with each element equal to 12.

(You may have to run rehash , just once, in order for the shell to notice mcc is there. Also, if MulSim is installed under someone else's account and you are compiling one of their examples, such as mat_mult.c, copy it to your directory first and then compile the copy.)

Another example is the prime-number counting program.

2.3 Cleanup

When everything is built and tested, you may remove many of the files. Type

rm -r -f $mulsimroot/MulSimSrc
rm -r -f $MCC_PATH/src

2.4 Internal Operation

(Skip this section if you are not interested in modifying the compiler.)

The internal operation of the lcc compiler itself is described in the Web page and book cited in Section 2.2.1. In this section here, we focus on the operation of the MulSim/lcc interface.

The directory $mulsimroot/Compiler/src/lcc/src contains .md (``machine description'') files, and related .c files, for various lcc target architectures. MulSim's files, mulsim.md and mulsim.c, are modified versions of the corresponding SPARC files.

The directory $mulsimroot/Compiler/include contains files which enable access from C source files to MulSim-specific constructs such as the test-and-set operation and the user hooks.

The directory $mulsimroot/Compiler/lib contains perl scripts which use lcc to insert its own user hook functions, do a compilation to generate files which are (almost) Mas assembly language, and which then do some rearranging to make them true Mas files.

3. Tutorial and Quick Reference

3.1 Description of our Sample Program

Consider the program Prime.c, which determines the count of all the prime numbers from 2 to N, where N is a command-line argument. Each processor checks Chunk numbers at a time.

3.2 Preparation of Our Sample Program

Every MulSim C source file must contain the line

#include <mulsim_lib.h>

We have also used the barrier code from our MulSim packages, as well as some calls from the MulSim-specific additions, such as LOCK() and UNLOCK().

3.3 Compiling Our Sample Program

We simply type

mcc Prime.c

This produces the executable file Prime, as well as the file Prime.lst which is useful for debugging.

3.4 Running Our Sample Program

Suppose we type

Prime 2 n b 100 10

This means we will run Prime on 2 processors, noninteractively (`n'), on a bus-based interconnect (`b'). Up to this point, all arguments are for the simulator. Then the last two arguments, 100 and 10, will be supplied as command-line arguments to Prime itself, in the int variables UserArgv[0] and UserArgv[1] (where Prime will use them for N and Chunk).

4. MulSim Processor Architecture

The MulSim CPU architecture is similar to that of the SPARC chips designed by Sun Microsystems. Word and address widths are 32 bits. Memory is word-addressable.

4.1 Register File

The register file is similar to those used in the SPARC (and the Berkeley RISC I and II), using the concept of overlapping windows to store the runtime stack on-chip. A register then has both an absolute number, indicating its position within the entire register file, and a relative number, indicating its position within the current register window.

Each window has 32 registers, with relative numbers 0-31. The registers with relative numbers 0-9 are global registers; they also have absolute numbers 0-9, shared by all windows. Relative (and absolute) register number 0 has the value 0 hard-wired into it.

footnote: If it is used as a destination register, its value will not change.

Consecutive windows overlap: The registers with relative numbers 26-31 in a window are physically identical to the registers with relative numbers 10-15 in the previous window.

So, the correspondence of relative register numbers to absolute register numbers in, say, the first three windows is:

window r0-r9 r10-r15 r16-r25 r26-r31

0
a0-a9 a26-a31 a16-a25 a10-a15

1
a0-a9 a42-a47 a32-a41 a26-a31

2
a0-a9 a58-a63 a48-a57 a42-a47

(The letters `r' and `a' denote relative and absolute register numbers.) The other windows continue in this fashion.

Each time a function call is made, the Current Window Pointer (CWP) is incremented by 1. When a procedure return is executed, the opposite movement occurs. The calling function is called the parent, and the called function the child.

Communication of function parameters from the calling function to the called function is done via the registers in the region of overlap between the two windows: Relative numbers 10-15 of the parent window are relative numbers 26-31 of the child window. Relative register p of the parent (for p in the range 10-31) is relative register p+16 of the child. For example, if the parent places a parameter in register 12, the function will pick it up in register 28, which is physically the same register.

4.2 Instruction Set

The MulSim processors have a load/store architecture, meaning that the only accesses to memory are loads from and stores to memory.

footnote: Plus the tas and ainc instructions

The full instruction set is given in Appendix 1.

Note in particular the atomic memory-access instructions tas and ainc, which are central to MulSim's multiprocessor operation. The tas instruction is a classical test-and-set operation, while ainc implements fetch-and-add.

footnote: See for example High-Performance Computer Architecture, by H. Stone, Addison-Wesley, third edition, 1993.

5. Mas Assembly Language and Assembler (Optional)

(You may skip this section if you intend only to work at the C-language level.)

5.1 File Structures

The assembly-language source file name must have the suffix .mas. From this file, the assembler produces several output files, with the same prefix. Say for example that the source file is x.mas; then the following files are produced by the assembler:

x.out: ``machine-language'' file (actually a file of InstrStruct; see $mulsimroot/Distrib/Include/Include.h)
x.lst: shows which addresses the source lines have been assigned to
x.sym: shows which addresses the variables declared in x.mas have been assigned to (also tells MulSim how much simulated memory is needed)

Note that of these three files, x.out and x.sym are mainly for MulSim's internal use, and are generally not viewed directly by the user. The x.lst file, though, is intended for the user; it is handy when setting breakpoints with the b command in MulSim.

Data declarations must come at the beginning of the .mas file (though they can be preceded by comment or blank lines), beginning with a line ``.data''. Each data declaration takes the form of the symbol and the number of words to be reserved for that symbol. For example, in Sum.mas, the line

y 25

declares a symbol y, and reserves 25 words for it.

After the data declarations come the instructions, preceded by a line ``.text''. Labels are required to be on a separate line, ending with `:'; they cannot begin with a digit, `r' or `&'. The main program must come first (MulSim begins execution at instruction 0). The last source line must be ``.end''. Comments must be on a separate line and have `;' as the first character in the line. Blank lines are acceptable anywhere.

5.2 Examples

Several example Mas programs (along with their .c user hooks files, explained in detail in Section 9) are given in the directory $mulsimroot/Distrib/Examples. In addition to their use in learning how to use MulSim, they also contain utilities which can be used in your own program.

5.3 Invoking the Assembler

Make sure that your .cshrc file sets $mulsimroot to the top MulSim directory (its immediate subdirectories are Bin, Compiler, Doc, etc.).

One must first prepare both the assembly-language source file, with a .mas suffix, and a C file containing the User\*() functions (even if null); the requirements of the C file are described in Section 9. Suppose these are X.mas and X.c. Then the compilation and assembly should be done as

cc -o X -I$mulsimroot/Include X.c $mulsimroot/Lib/libms.a
Mas X

assuming that the files Include.h and libms.a are in the current directory.

6. Operation and Usage of the MulSim Simulator

6.1 Operation

As mentioned earlier, the simulator loads the program starting at memory location 0.

A few idealizations and convenience measures have been employed:

The processors are assumed to have a common clock, and thus operate in lockstep. This aids in debugging, and makes for a well-defined run time for a program, in the form of the number of cycles executed.
MulSim assumes an infinite instruction cache, so that the only memory accesses model are for data. For that reason, there are also no pipeline delay slots at a branch.
The total number of windows is set by a line
```
#define NWindows 20
```
in Include.h of the MulSim source code package, from which it is hard-coded into libms.a. The simulator will not handle window overflow. If the runtime stack reaches a depth equal to the total number of windows, the simulator will detect the overflow and halt the program.

6.2 Usage

First prepare your MulSim executable file, either from assembly-language (Section 5) or C-language (Section 7).

In executing a MulSim program the form of the command line is:

<prefix> <# of CPUs> <'i'/'n'> <interconnect> <arg4> <arg5>...

where the various components are as follows:

< prefix > :: the prefix part of the program name, e.g. X if the Mas source file name was X.mas or the C source was X.c.
i:: ``interactive,'' i.e. the user wishes to give step-by-step commands to MulSim; these commands (described below) are similar to those of many debuggers, and can be used for both debugging and for closely examining program behavior
n:: ``noninteractive,'' i.e. the user wishes the program to run to completion without any pauses
< interconnect > :: the processor/memory interconnect type; see Section 8
< arg4 > , < arg5 > ,...:: command-line parameters (if any) for the user Mas program, read by UserInit(); < arg4 > may be accessed from within UserInit() as Argv[StartUserArgs], < arg5 > as Argv[StartUserArgs+1], and so on; for example, for matrix problems these parameters can be used to specify matrix dimensions

7. MulSim C Compiler

7.1 Usage

If you did not compile MulSim yourself and you are using someone else's file, make sure your path and environment variables are set correctly, with

set mulsimroot = 
setenv MCC_PATH $mulsimroot/Compiler
set path=(  $mulsimroot/Bin $MCC_PATH/bin )a

in your .cshrc file. (Note: There is another mcc, unrelated, on the ACS machines at UCD, so it is important to make sure that the MulSim mcc is ahead of the other one in your search path.)

Say your C source file is, say, X.c. Make sure your file contains the line

#include <mulsim_lib.h>

at the top. To compile, simply type

mcc X.c

The mcc script will do all the necessary preprocessing, compiling and assembling, and produce a MulSim executable file X. You then execute as usual, for example typing

X 4 n b

to run X on a four-processor, bus-based system in a noninteractive manner.

IMPORTANT: When running mcc, make sure no errors are reported. The scripts will continue even after an error message (e.g. "warning: no input file") is emitted.

7.2 MulSim-Specific Additions

There are several MulSim-specific additions to the C language, some of which allow specification of interprocessor synchronization and others which are work-arounds to MulSim restrictions.

7.2.1 Interprocessor Synchronization

MulSim's tas (test-and-set) and ainc (atomic increment) instructions may be generated from C source files, using built-in functions (actually macros):

int TAS( int x );

int AINC( int x );

Consistent with the MulSim instructions that these functions generate, they return the old values of x.

Also, we have LOCK/UNLOCK (spin lock) functions (again macros):

LOCK( int x );
UNLOCK( int x );

For example, LOCK() consists of a while loop which continually executes TAS until the lock value is 0 (unlocked).

7.2.2 System Values

The built-in macros CPU_NUM and SYS_SIZE evaluate to the processor number and number of processors in the system, respectively. Note that they have no arguments, and for that matter, no parentheses.

7.2.3 Command-Line Arguments

In MulSim C source files, one cannot use the usual argc and argv command-line variables. However, there is a substitute, as follows.

Each MulSim C source implicitly includes the following declaration (do not include it explicitly):

int UserArgc,UserArgv[10];

The variable UserArgc contains the number of application-specific command-line arguments. For instance, in Section 1.2 we had the example

KWQSort 16 n b 100 8 1

in which argc would be 7 but UserArgc is 3, since only the last three arguments are specific to this application. Then UserArgv[0] would be equal to 100 (not the character string "100"), and so on.

The application-specific command-line arguments are required to be integers (including signed integers). Their numeric values are in the array UserArgv.

7.2.4 Printing to stdout

One may print integers and strings with the following special functions:

void print_int( int x );

void print_str( const *char );

Note restrictions:

Observe the const restriction; only quoted strings may be used.
Each call must be on a separate line.
The left parenthesis must follow immediately after ``str''.
For print_int(), the argument must be an actual integer variable (scalar or individual array element), not a call to a function which has an integer return value.
At most 25 calls to print_str() may appear in any C source file.

By the way, you may find that output from these functions will not actually occur until you send an end-of-line character, i.e.

print_str("\n");

7.3 Restrictions and Bugs

The compiler accepts ANSI C code, but is not fully developed (and may never be). Thus there are a number of capabilities you might otherwise take for granted which are missing here. Read this section carefully.

7.3.1 Data Types, malloc()

The type float is not currently supported. However, int is supported in all forms, including arrays of int, pointers to int, structs of int and so on.

At the present time, static storage for local variables is not supported.

Note that there is no malloc() function but we do have a malloc-substitute package; click here.

7.3.2 Initializing Variables:

Variables may not be initialized within their declaration lines. In other words, something like

int a = 2;

is not allowed.

(By the way, this may cause some ``timing'' problems if you use barriers. Make sure to just allow the compiler to set the intitial values of barrier variables to 0 (since they typically will set things up so that all bits in all variables are 0s.)

7.3.4 Stack Size and Local Arrays

Even though the MulSim architecture uses register windows instead of a stack, the compiler still generates some stack references. For example, if a function has more local variables than can fit in a register window, the extra ones are stored on the stack, pointed to by the r30 frame pointer.

Each processor must have a separate stack space, which is arranged by the script sep_stack in the directory Compiler/lib. That script allows for a stack of 2,000 words per processor. The overall stack area for all processors combined is set by the script post_rcc_reorder_segments.prl in that directory, set to a value of 200,000.

Thus one must either avoid declaring long local arrays or must change the proper compile script(s).

There is currently no checking for stack overflow.

7.3.4 No Separate Compilation

Separate compilation of program modules is not supported.

7.3.5 No Library Functions or System Calls

The C library functions, for example printf(), are not available. (Alternatives for printing are supplied, as explained in Section 7.2.3.).

Similarly, no system calls are available either---after all, we have no operating system to call!

7.3.6 Compile Script Compiles Anyway After Announcing Error

If you forget to define a function, say F, although the compile script will print an error message

error in lookup of symbol table; symbol _F not found

but due to a bug in the script, it will go ahead and produce an executable MulSim binary anyway. Upon execution you will get an error message similar to

illegal opcode in processor 1, PC ed:  
previous PC was ec
likely cause:  register numbers in call and ret instructions don't 
correspond to each other

If you see this, recompile and watch for the "symbol not found" error message, and then add the function before recompiling.

7.3.7 "Assertion" Error Message

There have been some reports of the compiler balking at code involving arrays and indexes which are formal arguments to functions, or indexes which are local variables to functions with array arguments. When the problem occurs, it is as an assertion failure at line 198 in

$mulsimroot/Compiler/src/lcc/src/gen.c

If, say, j is a parameter or a local variable in a function, you may have to copy j to another local, say jj, and use the latter. So, if we have, say,

void x(int *a,b)  

{  int j;

   ...

   array[j] = array[j+1];

it should be modified to

void x(int *a,b)
 
{  int j;
 
   ...
 
   jj = j;
   array[jj] = array[jj+1];

In some cases, inserting a dummy line such as

i = i;

just before or after the offending line will fix things.

7.4 Examples

The directory $mulsimroot/Distrib/Compiler/examples contains a few illustrations of the use of C in MulSim.

7.5 Packages:

Several supplementary packages for use with MulSim are available:

8. Built-In and User-Defined Memory-Access Models

8.1 Built-In Models

MulSim currently offers the following choices of built-in memory models:

PRAM:

This is an idealized, theoretical model in which all memory requests are processed in a single cycle, with no contention among them. This is good for debugging Mas programs.

bus:

All processors and memory modules are connected to a single bus. A memory access takes one cycle, but only one access can be done per cycle.

snoopy-update:

This is the same as `bus' above, but with a snoopy cache-coherent bus protocol. Each CPU has a data cache. Whenever a CPU does a write to a block having a copy in the CPU's cache, all other copies of that block in other caches are updated. Cache size is assumed infinite, with all blocks being in the state Shared upon startup.

snoopy-invalidate:

This is the same as the snoopy-update protocol, but using an invalidate policy. Whenever a CPU does a write to a block having a copy in the CPU's cache, all other copies of that block in other caches are invalidated. Cache size is assumed infinite, with all blocks being in the state Shared upon startup. When a cache miss occurs, BlockSize bus cycles are needed to bring in a valid copy of that block.

The command-line codings (Section 6.2 for these interconnects are:

PRAM
p
bus:
b
snoopy-update:
su
snoopy-invalidate:
siBlockSize
For example, si8 specifies snoopy-invalidate with block size of 8.

8.2 User-Defined Interconnects

Furthermore, MulSim offers the user the option of defining his/her own memory models. The rest of this section presents an outline of how to do this. The reader who will not be developing such models may skip this material.

MulSim has a separate Memory Access Function (MAF) for each of its memory models, i.e. Pram() for PRAM, Bus() for bus, etc. In the case of a user-defined memory model, the MAF will be UserMem(), written by the user. Each iteration of the main loop in the MulSim program simulates a single cycle of the system, and consists of the following main components:

    for (P = 0; P < NCPUs; P++)  {
          if processor P does not have a memory request pending
             execute P's current instruction (possibly generating
            a new memory reference)
       }
       call the MAF

The MAF must check the NewMemRequestsHead queue. The function must also keep track of when a memory request will be satisfied, and at that time it must copy the Value field of the CPU to or from memory (depending on whether it is ld, ste, ainc or tas). When the memory action is done, the appropriate function must be called, LdDone(), StDone(), etc.

MulSim has a function for each instruction type, e.g. Add() for the add instruction, Call() for the call instruction, and so on. These functions are called during the ``execute P's current instruction'' step mentioned above. Developers of specialized memory models should find the code for Ld() instructive:

Ld(PN)  /* initiate load; later, LdDone() will be called later */
   int PN;
  
{  CPU[PN].EffAddrs = CPU[PN].Reg[CPU[PN].AbsRS1] + ISPtr->Base;
   CPU[PN].MemAccPending = 1;
   CPU[PN].MemAccType = READ;
   if (InterconType == XBAR || InterconType == USER)  
      SetUpNewMemRequest(PN);
}

The MulSim function SetUpNewMemRequest() here will add an element of type MemAccStruct (see $mulsimroot/Distrib/MulSimSrc/Include.h) to the queue headed by NewMemRequestsHead.

When the ``call MAF'' step mentioned above is done, the appropriate MAF will be called. It will then do whatever current processing is needed on previously outstanding memory requests, and then process all the new ones, removing them from the queue (if that MAF uses the queue).

As stated earlier, when the MAF finishes a memory action, it must call the corresponding ``xDone()'' function. For a load, for instance, the MAF must call LdDone() when the load is complete (i.e. when the value to be loaded reaches the CPU). LdDone(), StDone(), etc., are all functions internal to MulSim; here for example is the code for LdDone():

LdDone(PN)
   int PN;

{  unsigned Tmp; 

   Tmp = CPU[PN].MemValue;
   SetNZ(PN,Tmp);
   if (CPU[PN].AbsRD) CPU[PN].Reg[CPU[PN].AbsRD] = Tmp; 
   CPU[PN].MemAccPending = 0;
}

This function places the value to be loaded into the destination register (unless that register is number 0, which has the value 0 hard-wired), sets/clears the N and Z conditions code appropriately, and clears the MemAccPending flag, indicating the the processor may now proceed to the next instruction following the load.

9. Writing the User Hooks (Optional)

(You may skip this section if you plan to work only at the C-language level. In fact, if you work at that level, some of the Perl files in the compilation process need to be modified, since they implement user hooks themselves.)

Suppose the user's Mas file is named x.mas. Then he/she is required to have a C-language file x.c, with the following contents:

A line
```
#include "<path>/Include.h"
```
at the beginning of the file, to access $mulsimroot/Distrib/Include/Include.h.
The functions UserInit(), UserHook(), UserStat() and UserMem(), at least in null form.
A line
```
 
#include "<path>/Main.c"
```
at the end of the file, to access $mulsimroot/Distrib/Include/Main.c.

The general roles of the functions may be summarized as follows:

UserInit():

This is called at the beginning of execution of MulSim.

Typical use is to initialize variables declared by the user in the user's Mas program source file.

UserHook(PN,HookConst):

This is called whenever MulSim executes a userhook instruction in the user's Mas program. A userhook instruction has one operand, a constant. MulSim passes this value to UserHook() via the parameter HookConst (as well as passing the processor number in the parameter PN). This allows the user to insert different hooks, i.e. perform different tasks, as various points in his/her Mas program, by specifying a different value in each one; that constant can be used as a key into a switch statement within UserHook().

Some typical uses:

Gathering application-specific statistics on program performance, such as the mean number of cycles needed per memory access.
As a debugging aid. For example, MulSim's es command causes execution to continue until the global MulSim variable Stop is set by UserHook(). In this way, UserHook() can be used to set up complex conditional breakpoints.

UserStat():

This is called by MulSim when all processors have halted. Typical uses:

Print to the screen the Mas program's ``output,'' such as a matrix after an matrix-inversion operation has been applied to it.
Print out the statistics gathered by UserHook() during the course of execution of the program.

UserMem():

This defines the processor/memory interconnect if the user wishes to define his/her own memory model, instead of using one of the ones built in to MulSim (PRAM, bus, etc.). See Section 8.2.

In writing the above functions, you will typically make use of one or more of the MulSim functions below.

NamedGet(Sym,Index):

This returns the value in Mem[s+Index], where s is the address of the variable Sym in the user's Mas program.

NamedPut(Value,Sym,Index):

This stores the number Value in Mem[s+Index] (see NamedGet() above).

NamedFloatGet():

This is like NamedGet() but returns a float value. It is a byte-for-byte copy, not a numeric conversion.

NamedFloatPut():

This is like NamedPut() above (see also NamedFloatGet() above).

RegValue(PN,R):

This returns the value in (relative) register R in processor PN.

FloatRegValue(PN,R):

This is the same as RegValue(), except that its return value is in float form.

10. Debugging

Debugging parallel programs is much more difficult than for nonparallel ones, but here are a few tips for making the process easier with MulSim.

10.1 Using the Built-In Debugging Tool

10.1.1 Correlating C Source Code with the Assembly Level

Make use of the MulSim simulator's debugging features, even if your source program is in C. The debugger operates at the assembly language level, but the assembly language is easy to learn, and you can easily correlate lines in the .mas file produced by the compilation process to lines in the C source file, simply by looking for (global) variable names from your .c file in the .lst file. Here is how:

Say you have a function in your C source code file g.c declared as follows.

int X(int A, int B)

{  int U,V,W;

   U = A + B;
   V = Z;
   if (U == V) W = U;
   else W = V;
   return W;
}

where Z is a global variable. The compilation process will produce a file g.lst which shows the assembly instructions and their memory locations. The portion corresponding to X will look like this (I've removed the comments):

_X:
    6   save r14 -21 r14
    7   add r26 r27 r29
    8   add r0  &_Z r24
    9   ld r24 0 r28
    a   sub r29 r28 r0
    b   jmp ne L2
    c   add r29 0 r25
    d   jmp none  L3
L2:
    e   add r28 0 r25
L3:
    f   add r25 0 r26
   10   ret r31
L1:
   11   ret r31

Note that names appear in the assembly language with underscores, as we see with X and Z here. (This is standard practice for C compilers.) You don't see the names of the local variables, because they are stored in registers.

Ignoring the "bookkeeping" operation in location 6, we see code in location 7,

    7   add r26 r27 r29

which corresponds to the line

U = A + B;

in the original C source code. In locations 8-9 the code

    8   add r0  &_Z r24
    9   ld r24 0 r28

corresponds to the line

   V = Z;

in the original C source code. The instruction in location 8 adds r0, which always contains 0, to the address of Z, and puts the sum in r24. Location 9 then loads the contents of memory location r24+0 (i.e. r24) to r28. We can then infer from this, that r28 is the register the compiler has assigned to the local variable V.

During the debugging process, it is helpful to insert into the C source code for a function one assignment statement (involving a dummy global variable set up for this purpose) for each argument and local variable, in order to be able to tell at a glance which registers store which arguments and locals. Once a function is debugged, these statements can be removed.

So we might declare global variables Dummy1, Dummy2, Dummy3 and so on, and if a function has a local variable Y, say, we would insert into the beginning of the C code for the function a statement

Dummy1 = Y;

so that we can tell which register Y is stored in, or what register the address of Y will be stored in. (The compiler may choose to either store a local variable in a register or store the variable's address in a register. The latter case is necessary if the address of the variable, e.g. &Y in this case, appears elsewhere in the code for that function.)

With a little practice, we can easily follow the assembly language and correlate it to the C source code, and thus debug our program.

10.1.2 Debugging-Tool Commands

The assembly-language-level debugger is easy to learn, and includes online help.

To use the debugger, you run your program in interactive mode. In the case of our sample program Prime.c, run in noninteractive mode in our earlier example, we would now type

Prime 2 i b 100 10

where `i' is for ``interactive.''

Following is the list of MulSim interactive commands:

adr :: add registers to be displayed at all simulator pauses for CPU p; setting p = `a' means this will apply to all CPUs; you will be prompted for the register numbers; display is in decimal form
afr :: same as adr but in float form
axr :: same as adr but in hex form
b < n > :: set a breakpoint at instruction n (hex) for CPU p; setting p = `a' means this will apply to all CPUs
c:: cancel all display-processor commands
dr :: delete registers to be displayed for CPU p; setting p = `a' means this will apply to all CPUs; you will be prompted for the register numbers
eb:: execute until hit breakpoint
eh:: execute until all CPUs have halted, i.e. run the program until completion
es:: execute until UserHook() sets Stop
et < t > :: execute until simulated time t
h:: display summary of interactive commands
lm:: print to the screen all limits, e.g. maximum number of CPUs
ls < n > :: list 5 source lines; if n is specified, the listing will begin at instruction n (hex); otherwise the listing will begin at the instruction following the last one listed in the previously-issued ls command
md < s > < e > :: display (once only) the contents of the memory words at offsets k from the symbol s, k = b..e, where s is a variable in the user .mas program (variables from the .c file have an underscore prepended, so that for example Sum in the .c file is called _Sum in the .mas file created by the mcc compiler); form is decimal
mf < s > < e > :: same as md, but in float form
mx < s > < e > :: same as md, but in hex form
p:: add processors to be displayed (PC, condition codes, next instruction) at every pause
s:: step through one instruction cycle for all CPUs

To repeat the previous command (useful for the `li' and `s' commands), simply hit the carriage return.

10.1.3 What About a Segmentation Fault?

When running a C program on a real system, a segmentation usually arises from something like a pointer value (including an array index, equivalent to a pointer) which is far out of range. The best way to deal with this is to run the program with a debugging tool like gdb, and then when the fault occurs, use the "bt" command (in the case of gdb) to determine where in your source code the fault occurred.

If you get a segmentation fault message while running a MulSim program, it is also likely that you have an errant pointer value, but keep in mind that the error message came from the simulator program itself, not your MulSim application, so gdb is not useful. How can you determine where in your code---and on which CPU---the fault occurred?

In this case, go to the MulSim debugging tool, and first pinpoint the simulation time---in the variable SimTime---at the time of occurrence of the fault. Use a "binary search" approach. For instance, first try to run the program until time, say, 1000, using the command

et 1000

Suppose the fault occurs. Then start again, and try, say, running until time 500. Suppose the fault does not occur. Now you know the fault occurs some time between 500 and 1000, so maybe try re-running until time 750, and so on. Remember, by cutting the interval approximately in half in each new guess, you will quickly determine the exact time of the fault, say at 639.

Then re-run again, and issue the commands

et 639
p
a

The `p' command (and its parameter `a') will display all processors, showing where in the (assembly) code each one is when the fault occurs.

10.2 Using print_int() and print_str()

Try to use the built-in debugger as much as possible; it will save you time! However, since the debugger is not fully symbolic, you should also make use of the print_int() and print_str() functions.

However, be careful--if several processors are executing these at once, their output will be merged together! This will make it very hard to determine which output comes from which processor. To avoid this problem, it is recommended that you focus on a particular processor, and insert calls to these functions in such a way that they are only executed by that processor. Or, surround your printing by calls to LOCK() and UNLOCK(), setting up a variable named, say, PrintLock; see the PrintPackage file; click here.

Note that the mere insertion of print_int() and print_str() calls changes the timing of the program, and may even result in your bug disappearing. In that case, you should suspect some kind of lock problem (including failure to use locks when they are needed).

Appendix A. MulSim Instruction Set

Below is the instruction set definition and assembly-language syntax.


  fixed-point operations:

  add rs1,op2,rd    
  sub rs1,op2,rd
  mul rs1,op2,rd
  div rs1,op2,rd
  mod rs1,rs2,rd

  floating-point operations:

  fadd rs1,op2,rd  
  fsub rs1,op2,rd
  fmul rs1,op2,rd
  fdiv rs1,op2,rd

  boolean operations:

  bcomp rs1, rd       bitwise complement of rs1
  or rs1, rs2, rd     bitwise or
  xor rs1, rs2, rd    bitwise exclusive-or
  orn rs1, rs2, rd    bit complement of bitwise or
  xnor rs1, rs2, rd   bit complement of bitwise exclusive-or
  and rs1, rs2, rd    bitwise and
  andn rs1, rs2, rd   bitwise nand
  sll rs1, rs2, rd    shift-left logical
  srl rs1, rs2, rd    shift-right logical
  sra rs1, rs2, rd    shift-right arithematic

  changes to PC:

  jmp cond,label  
  call rs1,label      rs1 is used to save the return address
  ret rs1             restore PC from rs1; the rs1 here should be 16 more
                      than the rs1 in the call
  halt                PC stops incrementing
 
  reading/writing memory:

  ld rs1,base,rd      mem[rs1+base] --> rd
  st rs1,base,rd      rs1 --> mem[rd+base]
  ainc rs1,base,rd    atomic implementation of mem[rs1+base]++;
                      returns old value of mem[rs1+base] to rd
  tas rs1,base,rd     test-and-set, i.e. atomic implementation of 
                        tmp = mem[rs1+base];
                        rd = tmp;
                        if (tmp == 0) tmp = 1;
                        mem[rs1+base] = tmp;
                      0 means unlocked, 1 means locked; note that rs1 
                      operand can be used to set up an array of locks
 
  miscellaneous:

  save rs1,const,rd   calculates new stack pointer
  nop                 no operation
  cpunum rd           CPU number is put in rd
  systsize rd         total number of CPUs is put in rd
  userhook const      the user-defined function UserHook() is called

Here are the meanings of the operand codes:

rs1: First source register.
rs2: Second source register.
const: A constant.
op2: Either rs2 or const.
rd: Destination register.
cond: One of lt, le, eq, ge, ne and none, the latter meaning an unconditional jump.
label: A label in the assemly-language source file.
base: Either label or const.

Constants (described as `const' above) are taken to be decimal. If a constant is pre-pended with `&', the rest of the token is assumed to be a data symbol, and the constant generated is the address of that symbol.

The ``instructions'' cpunum, systsize and userhook are included for convenience and are thus not ``real'' instructions, so they should be used sparingly, so as to produce minimal perturbation of cycle counts.

The processor includes condition codes N and Z, indicating negative or zero results of the last instruction. All register-to-register instructions, and all instructions in the load/store group, affect these flags.

Footnotes:

¹ This software is distributed as is, with no guarantees of any kind. You are free to use the software for not-for-profit purposes, but commercial usage is forbidden.

window	r0-r9	r10-r15	r16-r25	r26-r31

0	a0-a9	a26-a31	a16-a25	a10-a15
1	a0-a9	a42-a47	a32-a41	a26-a31
2	a0-a9	a58-a63	a48-a57	a42-a47