Simulation Methodology

Norman Matloff

August 1, 2000

1 Complementary Roles of Mathematical Analysis and Simulation

In probabillistic models used in computer science research, mathematical analysis and simulation play complementary roles:

Mathematical analysis often allows us to gain insights which may not be obvious in simulation experiments. When one does simulation, it is often not clear whether one has really covered all the important parameter combinations, and often there are so many such combinations that simulating them all would be infeasible.
Simulation allows one to investigate models which are highly intractable from a mathematical point of view.
Mathematical analysis and simulation serve as checks on each other: Simulation allows us to check whether certain approximations made in our mathematical model (for the sake of mathematical tractability) are reasonable. The mathematical analysis serves as at least a partial check on the validity of our simulation program, which after all is subject to bugs.

An example of the value of mathematical analysis is the M/G/1 queue (exponential job interarrival times, general service times, one server). The mathematical analysis shows that the mean queue length depends not only on the mean of the service time, but also on its variance. This is something which is not intuitively obvious,¹and if we did only simulation analysis, it is something we may not think to investigate, thus missing it.

2 Space-Averaging and Time-Averaging Simulations

Consider the following simple code to find the probability of getting three heads out of five tosses of a coin:

Count = 0;
for (I = 0; I < NReps; I++) {
   NHeads=0;
   for (Toss=0; Toss < 5; Toss++)
      if (U() < 0.5) NHeads++;
   if (NHeads == 3) Count++;
}
printf("the approximate probability is %f\n{}'',((float) Count)/NReps);

(Here U() is a random number for a uniform distribution on (0,1), and NReps is a large integer.)

This program is finding the space average, meaning that it is ``averaging'' (a probability can be considered an average of 0-1 values) over all the different points in the sample space.

By contrast, consider a program which finds various values of interest in a queuing model (mean queue length, etc.). Unlike the example above, in which we repeated the experiment many times, here we perform the experiment just once, but observe the queue for a long period of simulated time, i.e. calculate a time average. (Repeating the experiment would mean simulating the queue again, starting from time 0 again.)

In probabilistic modeling applications in computer science research, time-average programs are the more prevalent kind. They are also called discrete-event simulations, meaning that events change in abrupt, ``discrete'' manners, such as an incrementing of a queue length by 1. By contrast, a model of the weather has continuously-changing values, e.g. temperature, and thus is not discrete-event.

Note that some discrete-event simulations are deterministic, not probabilistic.

3 World Views in Discrete-Event Simulation Programming

Simulation programming can often be difficult--difficult to write the code, and difficult to debug it. The reason for this is that it really is a form of parallel programming, with many different activities in progress simultaneously, and parallel programming can be challenging.

For this reason, many people have tried to develop simulation languages, or at least simulation paradigms (i.e. programming styles) which enable to programmer to achieve clarity in simulation code. Although special simulation languages have been invented in the past-notably SIMULA, which was invented in the 1960s and has significance today in that it was the language which invented the concept of object-oriented programmg which is so popular today-the trend today is to simply develop simulation libraries which can be called from ordinary C or C++ programs. So, the central focus today is on the programming paradigms. In this section we will present an overview of the three major discrete-event simulation paradigms.

3.1 Event-Oriented Paradigm

The event-oriented paradigm was common in the earlier years of simulation, used in packages in which a general-purpose programming language called functions in a simulation library. The chief virtue of this paradigm was its ease of implementation. The process-oriented paradigm required writing highly platform-dependent assembly-language routines for stack manipulation, so the event-oriented approach was much easier.

The event-oriented paradigm gets its name from the fact that there is an explicit event list, managed by the application programmer. Each time one event occurs, the application code must determine which new events will be triggered by this one, and add them to the event list.

For example, suppose we were to simulate an M/G/1 queue. The code for an arrival event would look something like this:

if machine not busy
   generate a random service time
   add service event to event list
else add this job to machine queue
generate time of next arrival
add this arrival to the event list

Here one an arrival event would definitely trigger another arrival event, and also possibly trigger a service event, all of which the code must manage explicitly.

I have written an event-oriented simulation package called ESim, available at http://heather.cs.ucdavis.edu/~matloff/sim.html

3.2 Process-Oriented Paradigm

Here each simulation activity is modeled by a process. The idea of a process is similar to the notion by the same name in UNIX, and indeed one could write process-oriented simulations using UNIX processes. However, these would be inconvenient to write, difficult to debug, and above all they would be slow.

As noted earlier, the old process-oriented software such as SIMULA and later CSIM were highly platform-dependent, due to the need for stack manipulation in the ``private'' management of processes. (``Private'' in the sense that the simulation processes would not be managed by the operating system, but rather by the application program.) However, these days this problem no longer exists, due to the fact that modern systems include threads packages (e.g. pthreads in UNIX, Java threads, Windows NT threads and so on). Threads are sometimes called ``lightweight'' processes.

If we were to simulate an M/G/1 queue using this paradigm, our process for arrivals would be much simpler than its event-oriented counterpart above, looking something like:

while (simulated time <= limit)
   generate time of next arrival
   sleep for that amount of time
   add this job to the machine queue

In contrast to the event-oriented case, the arrival code here does not have to check for whether a job should now be started at the machine. That is handled by the machine code, which would look something like this:

while (simulated time <= limit)
   wait until machine queue length is nonzero
   remove job from front of the machine queue
   generate a service time
   sleep for that amount of time

A widely used process-oriented package is C++SIM, and I have one called psim which is much simpler and easier to use (though less powerful) than C++Sim. See the links at http://heather.cs.ucdavis.edu/~matloff/sim.html.

3.3 Comparison

The process-oriented paradigm produces more modular code. This is probably easier to write and easier for others to read. It is considered more elegant.

However, process-oriented code may be more difficult to debug. Having a means to probe the current status of all threads (such as in psim) is vital.

Also, process-oriented code is generally slower than event-oriented code, since the former must deal with large numbers of context switches.

4 Random Number Generation

4.1 U(0,1) Distribution

There is a very rich literature on the generation of random integers, more properly called pseudorandom numbers because they are actually deterministic. These then can be divided by the upper bound of the integers to generate U(0,1) variates.

Here is an example:

#define C = 25173;
#define D = 13849;
#define M = 32768;

int U01()

{  static int Seed; 

  Seed = (C*Seed + D) % M;
  return Seed/((float) M);
}

(Note that Seed is declared as static.)

Is this algorithm a ``good'' approximation to U(0,1)? What does ``good'' mean?

We at least would want the algorithm to have the property that Seed can take on all values in the set {0,1,2,...,M-1}. It can be shown that this condition will hold if all the following are true:

D and M are relatively prime
C-1 is divisible by every prime factor of M
if M is divisible by 4, then so is C-1

You can check that the values of C, D and M in the code above do satisfy these conditions.

However, that is not enough. We would also hope that the values in the set {0,1,2,...,M-1} are hit approximately equally often, and that successive values returned by U01() roughly have the ``independence'' property. Again, there is a rich literature on this, but we will not pursue it here.

4.2 General Continuous Distributions

Suppose we wish to generate random numbers having density h. Let H denote the corresponding cumulative distribution function, and let G = H^-1 (inverse in the same sense as square and square root operations are inverses of each other). Set X = G(U), where U U(0,1). Then

F_X(t)

=

P(X Ł t)

=

P(G(U) Ł t)

=

P(U Ł G^-1(t))

=

P(U Ł H(t))

=

H(t)

In other words, X has density h, as desired.

For example, suppose we wish to generate exponential random variables with parameter l. In this case, H(t) = 1-e^-lt so G(u) = -[1/(l)]ln(1-u) . So, if I generate U from the function Rnd() above, then -[1/(l)]ln(1-U) will have the desired distribution. (And so would -[1/(l)]ln(U) , since U and 1-U have the same distribution.)

4.3 Tricks

It is often impossible to find a closed-form expression for G above, but other approaches can be found in some individual cases.

A G distribution, for instance, is by definition the sum of n independent exponential random variables (n and l are the parameters for this distribution), so we can generate it by calling our exponential random number generator n times.

An example of a much less obvious trick is a method to generate normally-distributed random numbers. Here is how it works. Let V and W be independent U(0,1) random variables, and set

X = cos(2pV)[1-2ln(W)]^[1/2]

and

Y = sin(2pV)[1-2ln(W)]^[1/2]

Then X and Y will be independent N(0,1) random variables.

Since our goal is to generate just one such variable at a time, we need to generate two but just use one and save the other:

float N01()

{  static int WhoseTurn; 
   float X,Return;
   static float Y;

   if (WhoseTurn==0) {
      X = ;
      Y = ; 
      Return = X;
   }
   else Return = Y;
   WhoseTurn = 1 - WhoseTurn;
   return Return;

5 Statistical Inference of Simulation Output

One problem with simulation analysis is that it is only a statistical sampling of the processes being studied. Thus the results are only approximate. Of course, the longer we run the simulation, the better the approximation, but how long is long enough?

5.1 Space-Average Case

In this case, we have a standard problem in statistical methodology, how to assess the accuracy of sample means and proportions.

Suppose we are estimating a mean m, say the mean number of heads one gets in 12 tosses of a coin, via simulation. We run the simulation in r repetitions of the experiment, i.e. r sets of 12 simulated tosses of a coin. Let [`X] denote the sample mean, i.e 1/r times the sum of the r numbers of heads obtained in the simulation, X₁,...,X₁₂ . Then [`X] approximately has the distribution N(m s²) , so for example

X

ą1.96
^
s

Ör

is an approximate 95% confidence interval for m. where

^
s

2

= 1
r
r
ĺ
i = 1
(X_i-
X

)²

is the sample estimate of s² .

It should be kept in mind that typically s² will not be uniform from one setting to the next in a simulation study. In an analysis of mean queuing times in computer networks, for instance, the variance will tend to be larger in the high-traffic (i.e. long queue-wait) cases, thus needing larger values of r.

Estimation a proportion p is actually just a special case of estimating a mean. (The proportion is the mean of a 0-1 variable, where 1 means the event occurs and 0 means it doesn't.) The sample mean [`X] now reduces to [^p] , the sample proportion, and since s² = [1/r]p(1-p) , we calculate [^(s)]² accordingly.

5.2 Time-Average Case

Here we do not ``repeat'' a simulation. Say we are modeling some queuing system. Instead of simulating from time 0 to, say, time 10000, and then simulating 0-10000 again, etc., with maybe 100 repetitions, we might simulate the system from time 0 to time 1000000. We are relying on the fact that the system converges (either as a Markov chain or as some similar kind of process having a long-run distribution), so that after that time it is similar to repeating the experiment.

The problem here is that the statistical analysis is not nearly so easy, due to the lack of independence. In the space-average setting we considered earlier, the X_i are independent of each other, which is crucial to the Central Limit Theorem, which was used for the formulas for confidence intervals. So, now we will have to work harder to (a) find independence, and (b) make use of it. The main point is to find regeneration points, which are points in the process at which ``time starts over.'' The resulting methodology is called regenerative analysis.

As our motivating example, consider a G/G/1 queue-general interarrival and service time distributions, 1 server. Let X(t) denote the number of jobs in the system (queued or in service) at time t.

Since we have no exponential distributions, there is no obvious source of ``memoryless'' behavior on which to base independence. However, after giving the matter further thought, one realizes that a set of regeneration points consists of the times at which jobs arrive to an empty queue.

Suppose the first four arrivals are at times 0, 12.5, 20.6 and 28.1, and the service times of the first three jobs are 13.0, 8.1 and 3.2. Then the third job finishes at time 0 + 13.0 + 8.1 + 3.2 = 24.3, so that fourth job encounters an empty queue when it arrives at time 28.1. Time then ``starts over'' at this point, and 28.1 is called the first regeneration point. (Time 0 was the ``0th'' regeneration point.)

Formally, what we require is that

the regeneration points T₁,T₂,... form a renewal process,
the random vector (X(t₁),...,X(t_k)) has the same distribution as (X(T_m+t₁),...,X(T_m+t_k)) for all k, m and t_i, and
the random variables {X(t):t < T_m} are independent of {X(T_m+t):t ł 0} for all m.

Under these conditions, it can be shown that X(t) has a limiting distribution, in a similar sense to that of Markov chains, with

lim_{tŽ Ľ}P(X(t) Ł x) = P(X Ł x)

existing for all x. In the queuing example above, for instance, X would be the number in the system at steady state.

Suppose we are interested in m = E[g(X)] for some function g. Again using the G/G/1 queue as our example, we might take g(t) = t, in which case m would give us the mean number of jobs in the system at steady state. Or we might take

g(t) = ě
í
î

1,
if t = 1

0,
if t š 1

Then m will be the long-run proportion of the time in which there is a job in service but no jobs queued.

Define t_i = T_i+1-T_i to be the times between regeneration points, and define the ``accumulated g values'' as

Y_i = ó
ő T_i+1

T_i
g[X(t)] dt

(Note that the Y_i are random variables!)

Again consider the G/G/1 example, with g(t) = t. You should check that Y₁ will be equal to 12.5x1+0.5x2+7.6x1+0.5x2+3.2x1+3.8x0 = 25.3.

The main theorems say that

the Y_i are i.i.d,
the t_i (from the renewal property), and
m = [E(Y)/(E(t))]

Here, then, is how we form a confidence interval for m from the simulation output. Say we run the simulation until the n-th regeneration point. Define the following:

Y

=

1
n
n
ĺ
i = 1
Y_i

t

=

1
n
n
ĺ
i = 1
t_i
^
m

=

Y

t

s₁₁

=

1
n
n
ĺ
i = 1
(Y_i-
Y

)²
s₂₂

=

1
n
n
ĺ
i = 1
(t_i-
t

)²
s₁₂

=

1
n
n
ĺ
i = 1
(Y_i-
Y

)(t_i-
t

)
s²

=

s₁₁-2 ^
m

s₁₂+ ^
m

2

s₂₂

Then an approximate 95% confidence interval for m is

^
m

ą1.96 s

t

n^1/2

Brief outline of the proof: By the Central Limit Theorem, [`Y] and [`(t)] are approximately normally distributed with the obvious means and variances. Then apply the delta method (see, e.g. CR Rao's classic book, Linear Statistical Inference, Sec. 6a.2), which says that a function (in this case, h(u,v)=u/v) of approximately normally distributed random variables is again approximately normally distributed, with a given mean and variance.

Footnotes:

¹Though some analysts ``explained'' it after the fact, i.e. after the mathematical result was derived.

File translated from T_EX by T_TH, version 2.60.
On 1 Aug 2000, 17:05.