In probabillistic models used in computer science research, mathematical analysis and simulation play complementary roles:
An example of the value of mathematical analysis is the M/G/1 queue (exponential job interarrival times, general service times, one server). The mathematical analysis shows that the mean queue length depends not only on the mean of the service time, but also on its variance. This is something which is not intuitively obvious,^{1}and if we did only simulation analysis, it is something we may not think to investigate, thus missing it.
Consider the following simple code to find the probability of getting three heads out of five tosses of a coin:
Count = 0; for (I = 0; I < NReps; I++) { NHeads=0; for (Toss=0; Toss < 5; Toss++) if (U() < 0.5) NHeads++; if (NHeads == 3) Count++; } printf("the approximate probability is %f\n{}'',((float) Count)/NReps);
(Here U() is a random number for a uniform distribution on (0,1), and NReps is a large integer.)
This program is finding the space average, meaning that it is ``averaging'' (a probability can be considered an average of 01 values) over all the different points in the sample space.
By contrast, consider a program which finds various values of interest in a queuing model (mean queue length, etc.). Unlike the example above, in which we repeated the experiment many times, here we perform the experiment just once, but observe the queue for a long period of simulated time, i.e. calculate a time average. (Repeating the experiment would mean simulating the queue again, starting from time 0 again.)
In probabilistic modeling applications in computer science research, timeaverage programs are the more prevalent kind. They are also called discreteevent simulations, meaning that events change in abrupt, ``discrete'' manners, such as an incrementing of a queue length by 1. By contrast, a model of the weather has continuouslychanging values, e.g. temperature, and thus is not discreteevent.
Note that some discreteevent simulations are deterministic, not probabilistic.
Simulation programming can often be difficultdifficult to write the code, and difficult to debug it. The reason for this is that it really is a form of parallel programming, with many different activities in progress simultaneously, and parallel programming can be challenging.
For this reason, many people have tried to develop simulation languages, or at least simulation paradigms (i.e. programming styles) which enable to programmer to achieve clarity in simulation code. Although special simulation languages have been invented in the pastnotably SIMULA, which was invented in the 1960s and has significance today in that it was the language which invented the concept of objectoriented programmg which is so popular todaythe trend today is to simply develop simulation libraries which can be called from ordinary C or C++ programs. So, the central focus today is on the programming paradigms. In this section we will present an overview of the three major discreteevent simulation paradigms.
The eventoriented paradigm was common in the earlier years of simulation, used in packages in which a generalpurpose programming language called functions in a simulation library. The chief virtue of this paradigm was its ease of implementation. The processoriented paradigm required writing highly platformdependent assemblylanguage routines for stack manipulation, so the eventoriented approach was much easier.
The eventoriented paradigm gets its name from the fact that there is an explicit event list, managed by the application programmer. Each time one event occurs, the application code must determine which new events will be triggered by this one, and add them to the event list.
For example, suppose we were to simulate an M/G/1 queue. The code for an arrival event would look something like this:
if machine not busy generate a random service time add service event to event list else add this job to machine queue generate time of next arrival add this arrival to the event list
Here one an arrival event would definitely trigger another arrival event, and also possibly trigger a service event, all of which the code must manage explicitly.
I have written an eventoriented simulation package called ESim, available at http://heather.cs.ucdavis.edu/~matloff/sim.html
Here each simulation activity is modeled by a process. The idea of a process is similar to the notion by the same name in UNIX, and indeed one could write processoriented simulations using UNIX processes. However, these would be inconvenient to write, difficult to debug, and above all they would be slow.
As noted earlier, the old processoriented software such as SIMULA and later CSIM were highly platformdependent, due to the need for stack manipulation in the ``private'' management of processes. (``Private'' in the sense that the simulation processes would not be managed by the operating system, but rather by the application program.) However, these days this problem no longer exists, due to the fact that modern systems include threads packages (e.g. pthreads in UNIX, Java threads, Windows NT threads and so on). Threads are sometimes called ``lightweight'' processes.
If we were to simulate an M/G/1 queue using this paradigm, our process for arrivals would be much simpler than its eventoriented counterpart above, looking something like:
while (simulated time <= limit) generate time of next arrival sleep for that amount of time add this job to the machine queue
In contrast to the eventoriented case, the arrival code here does not have to check for whether a job should now be started at the machine. That is handled by the machine code, which would look something like this:
while (simulated time <= limit) wait until machine queue length is nonzero remove job from front of the machine queue generate a service time sleep for that amount of time
A widely used processoriented package is C++SIM, and I have one called psim which is much simpler and easier to use (though less powerful) than C++Sim. See the links at http://heather.cs.ucdavis.edu/~matloff/sim.html.
The processoriented paradigm produces more modular code. This is probably easier to write and easier for others to read. It is considered more elegant.
However, processoriented code may be more difficult to debug. Having a means to probe the current status of all threads (such as in psim) is vital.
Also, processoriented code is generally slower than eventoriented code, since the former must deal with large numbers of context switches.
There is a very rich literature on the generation of random integers, more properly called pseudorandom numbers because they are actually deterministic. These then can be divided by the upper bound of the integers to generate U(0,1) variates.
Here is an example:
#define C = 25173; #define D = 13849; #define M = 32768; int U01() { static int Seed; Seed = (C*Seed + D) % M; return Seed/((float) M); }
(Note that Seed is declared as static.)
Is this algorithm a ``good'' approximation to U(0,1)? What does ``good'' mean?
We at least would want the algorithm to have the property that Seed can take on all values in the set {0,1,2,...,M1}. It can be shown that this condition will hold if all the following are true:
You can check that the values of C, D and M in the code above do satisfy these conditions.
However, that is not enough. We would also hope that the values in the set {0,1,2,...,M1} are hit approximately equally often, and that successive values returned by U01() roughly have the ``independence'' property. Again, there is a rich literature on this, but we will not pursue it here.
Suppose we wish to generate random numbers having density h. Let H denote the corresponding cumulative distribution function, and let G = H^{1} (inverse in the same sense as square and square root operations are inverses of each other). Set X = G(U), where U U(0,1). Then

In other words, X has density h, as desired.
For example, suppose we wish to generate exponential random variables with parameter l. In this case, H(t) = 1e^{lt} so G(u) = [1/(l)]ln(1u) . So, if I generate U from the function Rnd() above, then [1/(l)]ln(1U) will have the desired distribution. (And so would [1/(l)]ln(U) , since U and 1U have the same distribution.)
It is often impossible to find a closedform expression for G above, but other approaches can be found in some individual cases.
A G distribution, for instance, is by definition the sum of n independent exponential random variables (n and l are the parameters for this distribution), so we can generate it by calling our exponential random number generator n times.
An example of a much less obvious trick is a method to generate normallydistributed random numbers. Here is how it works. Let V and W be independent U(0,1) random variables, and set

and

Then X and Y will be independent N(0,1) random variables.
Since our goal is to generate just one such variable at a time, we need to generate two but just use one and save the other:
float N01() { static int WhoseTurn; float X,Return; static float Y; if (WhoseTurn==0) { X = ; Y = ; Return = X; } else Return = Y; WhoseTurn = 1  WhoseTurn; return Return;
One problem with simulation analysis is that it is only a statistical sampling of the processes being studied. Thus the results are only approximate. Of course, the longer we run the simulation, the better the approximation, but how long is long enough?
In this case, we have a standard problem in statistical methodology, how to assess the accuracy of sample means and proportions.
Suppose we are estimating a mean m, say the mean number of heads one gets in 12 tosses of a coin, via simulation. We run the simulation in r repetitions of the experiment, i.e. r sets of 12 simulated tosses of a coin. Let [`X] denote the sample mean, i.e 1/r times the sum of the r numbers of heads obtained in the simulation, X_{1},...,X_{12} . Then [`X] approximately has the distribution N(m s^{2}) , so for example

is an approximate 95% confidence interval for m. where

is the sample estimate of s^{2} .
It should be kept in mind that typically s^{2} will not be uniform from one setting to the next in a simulation study. In an analysis of mean queuing times in computer networks, for instance, the variance will tend to be larger in the hightraffic (i.e. long queuewait) cases, thus needing larger values of r.
Estimation a proportion p is actually just a special case of estimating a mean. (The proportion is the mean of a 01 variable, where 1 means the event occurs and 0 means it doesn't.) The sample mean [`X] now reduces to [^p] , the sample proportion, and since s^{2} = [1/r]p(1p) , we calculate [^(s)]^{2} accordingly.
Here we do not ``repeat'' a simulation. Say we are modeling some queuing system. Instead of simulating from time 0 to, say, time 10000, and then simulating 010000 again, etc., with maybe 100 repetitions, we might simulate the system from time 0 to time 1000000. We are relying on the fact that the system converges (either as a Markov chain or as some similar kind of process having a longrun distribution), so that after that time it is similar to repeating the experiment.
The problem here is that the statistical analysis is not nearly so easy, due to the lack of independence. In the spaceaverage setting we considered earlier, the X_{i} are independent of each other, which is crucial to the Central Limit Theorem, which was used for the formulas for confidence intervals. So, now we will have to work harder to (a) find independence, and (b) make use of it. The main point is to find regeneration points, which are points in the process at which ``time starts over.'' The resulting methodology is called regenerative analysis.
As our motivating example, consider a G/G/1 queuegeneral interarrival and service time distributions, 1 server. Let X(t) denote the number of jobs in the system (queued or in service) at time t.
Since we have no exponential distributions, there is no obvious source of ``memoryless'' behavior on which to base independence. However, after giving the matter further thought, one realizes that a set of regeneration points consists of the times at which jobs arrive to an empty queue.
Suppose the first four arrivals are at times 0, 12.5, 20.6 and 28.1, and the service times of the first three jobs are 13.0, 8.1 and 3.2. Then the third job finishes at time 0 + 13.0 + 8.1 + 3.2 = 24.3, so that fourth job encounters an empty queue when it arrives at time 28.1. Time then ``starts over'' at this point, and 28.1 is called the first regeneration point. (Time 0 was the ``0th'' regeneration point.)
Formally, what we require is that
Under these conditions, it can be shown that X(t) has a limiting distribution, in a similar sense to that of Markov chains, with

existing for all x. In the queuing example above, for instance, X would be the number in the system at steady state.
Suppose we are interested in m = E[g(X)] for some function g. Again using the G/G/1 queue as our example, we might take g(t) = t, in which case m would give us the mean number of jobs in the system at steady state. Or we might take

Then m will be the longrun proportion of the time in which there is a job in service but no jobs queued.
Define t_{i} = T_{i+1}T_{i} to be the times between regeneration points, and define the ``accumulated g values'' as

(Note that the Y_{i} are random variables!)
Again consider the G/G/1 example, with g(t) = t. You should check that Y_{1} will be equal to 12.5x1+0.5x2+7.6x1+0.5x2+3.2x1+3.8x0 = 25.3.
The main theorems say that
Here, then, is how we form a confidence interval for m from the simulation output. Say we run the simulation until the nth regeneration point. Define the following:

Then an approximate 95% confidence interval for m is

Brief outline of the proof: By the Central Limit Theorem, [`Y] and [`(t)] are approximately normally distributed with the obvious means and variances. Then apply the delta method (see, e.g. CR Rao's classic book, Linear Statistical Inference, Sec. 6a.2), which says that a function (in this case, h(u,v)=u/v) of approximately normally distributed random variables is again approximately normally distributed, with a given mean and variance.
^{1}Though some analysts ``explained'' it after the fact, i.e. after the mathematical result was derived.