\documentclass[11pt]{article}

\setlength{\oddsidemargin}{0.0in}
\setlength{\evensidemargin}{0.0in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.0in}
\setlength{\parindent}{0in}
\setlength{\parskip}{0.1in}

\usepackage{times}
\usepackage{fancyvrb}
\usepackage{relsize}
\usepackage{hyperref}

\usepackage{graphicx}

\begin{document}

\title{Random Number Generation}

\author{Norm Matloff}

\date{February 21, 2006 \\
\copyright{}2006, N.S. Matloff} 

\maketitle

\tableofcontents 

\section{Uniform Random Number Generation}
\label{block}

The basic building block for generating random numbers from various
distributions is a generator of uniform random numbers on the interval
(0,1), i.e. from the distribution U(0,1).\footnote{Since a random
variable which is uniformly distributed on some interval places 0 mass
on any particular point, it doesn't matter whether we talk here of
(0,1), [0,1], etc.}  How is this accomplished?

Due to the finite precision of computers, we cannot generate a true
continuous random variate.  However, we will generate a discrete random
variate which behaves very close to U(0,1). 

There is a very rich literature on the generation of random integers,
commonly called \textbf{pseudorandom numbers} because they are actually
deterministic.  Pseudorandom numbers can be divided by their upper bound
to generate U(0,1) variates.  Here is an example (in pseudocode):

\begin{Verbatim}[fontsize=\small,numbers=left]
int U01()
constant C = 25173
constant D = 13849
constant M = 32768
static Seed
Seed = (C*Seed + D) % M
return Seed/((float) M)
\end{Verbatim}

The name {\bf Seed} stems from the fact that most random number
libraries ask the user to specify what they call the {\bf seed value}.
You can now see what that means.  The value the user gives ``seeds'' the
sequence of random numbers that are generated.  Note that {\bf Seed} is
declared as \textbf{static}, so that it retains its value between calls.

This certainly will give us values in (0,1), and intuitively it is
plausible that they are ``random.'' But is this a ``good'' approximation
to a generator of U(0,1) variates?  What does ``good'' really mean?

Here is what we would like to happen.  Suppose we call the function
again and again, with $X_i$ being the value returned by the $i^{th}$
call.  Then ideally the $X_i$ should be independent, which means
we would have

\begin{itemize}

\item For any $0 < r < s < 1$,

\begin{equation}
\label{unif}
\lim_{{n} \rightarrow {\infty}} \frac{C(r,s,n)}{n} = s-r
\end{equation}

where C(r,s,n) is a count of the number of $X_i$ which fall into (r,s),
i = 1,...,n.

\item For any $0 < r < s < 1$,  $0 < u < v < 1$ and integer $k > 0$,

\begin{equation}
\label{indp}
\lim_{{n} \rightarrow {\infty}} \frac{D(r,s,u,v,n,k)}{n} = (s-r)(v-u)
\end{equation}

where D(r,s,u,v,n,k) is the count of the number of i for which
$r < X_i < s$ and $u < X_{i+k} < v$, i = 1,...,n.

\item The third- and higher-dimensional versions of (\ref{indp}) hold.

\end{itemize}

Equation (\ref{unif}) is saying that the $X_i$ are uniformly distributed
on (0,1), and (\ref{indp}) and its third- and higher-dimensional
versions say that the $X_i$ are independent.  How close can we come to
fully satisfying these conditions?

To satisfy (\ref{unif}), the algorithm needs to have the property that
{\bf Seed} can take on all values in the set $\{0,1,2,...,M-1\}$,
without repetition (until they are all hit).  It can be shown that this
condition will hold if all the following are true:

\begin{itemize}

\item {\bf D} and {\bf M} are relatively prime

\item {\bf C-1} is divisible by every prime factor of {\bf M}

\item if {\bf M} is divisible by 4, then so is {\bf C-1}

\end{itemize}

Typically {\bf M} is chosen according to the word size of the machine.
On today's 32-bit machines {\bf M} might be $2^{32}$, or about 4
billion.  This makes it easy to do the {\bf mod M} operation.

However, determining which values of {\bf C} and {\bf D} give
approximate independence is something that requires experimental
investigation.  Again, there is a rich literature on this, but we will
not pursue it here.  You can assume, though, that any reputable library
package has chosen values which work well.\footnote{And of course you
can do your own experimental investigation on it if you wish.}

\section{Generating Random Numbers from Continuous Distributions}

There are various methods to generate continuous random variates.  We'll
introduce some of them here.

\subsection{The Inverse Transformation Method}
\label{inv}  

\subsubsection{General Description of the Method}

Suppose we wish to generate random numbers having density h. Let H
denote the corresponding cumulative distribution function, and let \(
G=H^{-1} \) (inverse in the same sense as square and square root
operations are inverses of each other). Set X = G(U), where U has a
U(0,1) distribution. Then

\begin{eqnarray}
F_{X}(t) & = & P(X\leq t) \nonumber \\
& = & P(G(U)\leq t) \nonumber \\
& = & P(U\leq G^{-1}(t)) \nonumber \\
& = & P(U\leq H(t)) \nonumber \\
& = & H(t)
\end{eqnarray}

In other words, X has density h, as desired.

\subsubsection{Example:  The Exponential Family}
\label{expfam}

For example, suppose we wish to generate exponential random variables
with parameter \( \lambda  \). In this case,

\begin{equation}
H(t) = \int_{0}^{t} \lambda e^{-\lambda s} ~ ds = 1-e^{-\lambda t}
\end{equation}

Writing u = H(t) and solving for t, we have

\begin{equation}
G(u)=-\frac{1}{\lambda }\ln(1-u)
\end{equation}

So, pseudocode for a function to generate such random variables would be

\begin{Verbatim}[fontsize=\small]
float Expon(float Lambda)
return -1.0/Lambda * log(1-U01())
\end{Verbatim}

If a random variable Y has a U(0,1) distribution then so does 1-Y.  We
might be tempted to exploit this fact to save a step above, using {\bf
U01()} instead of {\bf 1-U01()}.  But if U01() returns 0, we are then
faced with computing log(0), which is undefined.  (U01() never returns
1, as formulated in Section \ref{block}.) 

\subsection{The Acceptance/Rejection Method}

\subsubsection{General Description of the Method}

Suppose again we wish to generate random numbers having density h.  With
h being the exponential density above, $H^{-1}$ was easy to find, but in
many other cases it is intractable.  Here is another method we can use:

We think of some other density g such that 

\begin{itemize}

\item We know how to generate random variables with density g.

\item There is some c such that $h(x) \leq c g(x)$ for all x.

\end{itemize}

Then it can be shown that the following will generate random variates
with density h:

\begin{Verbatim}[fontsize=\relsize{-2}]
while true:
   generate Y from g
   generate U from U(0,1)
   if U <= h(Y)/(c*g(Y)) return Y
\end{Verbatim}

\subsubsection{Example}

As an example, consider the u-shaped density $h(z) = 12(z-0.5)^2$ on
(0,1).  Take our g to be the U(0,1) density, and take c to be the
maximum value of h, which is 3.  Our code would then be 

\begin{Verbatim}[fontsize=\relsize{-2}]
while true:
   generate U1 from U(0,1)
   generate U2 from U(0,1)
   if U2 <= 4*(U1-0.5)**2 return U1
\end{Verbatim}

\subsection{{\it Ad Hoc} Methods}

In some cases, we can see a good way to generate random numbers from a
given distribution by thinking of the properties of the distribution.

\subsubsection{Example:  The Erlang Family}

For example, consider the Erlang family of distributions, whose
densities are given by

\begin{equation}
\frac{1}{(r-1)!} \lambda^r t^{r-1} e^{-\lambda t}, ~ t > 0
\end{equation}

Recall that this happens to be the density of the sum of r independent
random variables which are exponentially distributed with rate parameter
$\lambda$.  This then suggests an easy way to generate random numbers
from an Erlang distribution:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
Sum = 0
for i = 1,...,r
   Sum = Sum + expon(lambda)
return Sum
\end{Verbatim}

where of course we mean the function expon() to be a function that
generates exponentially distributed random variables, such as in Section
\ref{expfam}. 

\subsubsection{Example:  The Normal Family} 

Note first that all we need is a method to generate normal random variables of
mean 0 and variance 1.  If we need random variables of mean $\mu$ and
standard deviation $\sigma$, we simply generate N(0,1) variables,
multiply by $\sigma$ and adding $\mu$.

So, how do we generate N(0,1) variables?  The inverse transformation
method is infeasible, since the normal density cannot be integrated in
closed form.  We could use the acceptance/rejection method, with g being
the exponential density with rate 1.  After doing a little calculus, we
would find that we could take $c = \sqrt{2e/\pi}$.

However, a better way exists.  It can be shown that this works:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
static n = 0
static Y
if n == 0
   generate U1 and U2 from U(0,1)
   R = sqrt(-2 log(U1))
   T = 2 pi U2
   X = R cos(T)
   Y = R sin(T)
   n = 1
   return X
else 
   n = 0
   return Y
\end{Verbatim}

\section{Generating Random Numbers from Discrete Distributions}

\subsection{The Inverse Transformation Method}

The method in Section \ref{inv} works in the discrete case too.  Suppose
we wish to generate random variates X which take on the values 0,1,2,...
Let 

\begin{equation}
q(i) = P(X \leq i) = \sum_{k=0}^i p_X(i) 
\end{equation}

Then our algorithm is

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
generate U from U(0,1)
I = 0
while true
   if U < q(I) return I
   I = I + 1
\end{Verbatim}

\subsection{{\it Ad Hoc} Methods}

Again, for certain specific distributions, we can sometimes make use of
special properties that we know about those distributions.

\subsubsection{The Binomial Family}

This one is obvious:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
Count = 0
for I = 1,..,n
   generate U from U(0,1)
   if U < p Count = Count + 1
return Count
\end{Verbatim}

\subsubsection{The Geometric Family}

Again, obvious:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
Count = 0
while true:
   Count = Count + 1
   generate U from U(0,1)
   if U < p return Count
\end{Verbatim}

\subsubsection{The Poisson Family}

Recall that if events, say arrivals to a queue, have interevent times
which are independent exponentially distributed random variables with
rate $\lambda$, then the number of events N(t) up to time t has a Poisson
distribution with parameter $\lambda t$:

\begin{equation}
P(N(t) = k) = \frac{e^{-\lambda t} (\lambda t)^k}{k!}, k = 0,1,2,...
\end{equation}

So, if we wish to generate random variates having a Poisson distribution
with parameter $\alpha$, we can do the following:

\begin{Verbatim}[fontsize=\relsize{-2},numbers=left]
Count = 0
Sum = 0
while 1
   Count = Count + 1
   Sum = Sum + expon(alpha)
   if Sum > 1 return Count-1
\end{Verbatim}

where again expon(alpha) means generating an exponentially distributed
random variable, in this case with rate $\alpha$.

\end{document}


