The Poisson Process

The Poisson process is a random process that models the random arrival times of customers at a store, or the occurrence of auto accidents at an intersection, or the decay of atoms in a sample of some radioisotope.

It can be described as follows:  Draw a number at random from an exponential distribution.  That will be the arrival time of the first customer.  Draw another random number from the exponential distribution.  Add that to the first arrival time to get the second arrival time.  Draw a third random number from the exponential distribution.  Add that to the second arrival time to get the third arrival time.  Keep going for as long as you wish (say, until the time the store is supposed to close).   The idea is:  the times between arrivals of customers are exponentially distributed.

We can simulate the Poisson process using a spreadsheet without too much trouble.  Choose a number Mean that gives the average number of arrivals per hour (or minute, or whatever your time units are).  Put the formula =-1.0*Mean*LN(RAND( )) into a cell, say cell A4.  (Recall this generates a random number from the exponential distribution.)  Then in cell A5 put the formula =A4-Mean*LN(RAND( )).  This adds a new random number from the exponential distribution to A4.  Copy and paste that formula into cells A6, A7, etc.  This will generate arrival times according to the Poisson process.

The Poisson process is a realistic model for customer arrival times and other similar phenomena because of a special property of the exponential distribution: it is "memory-less".  This means the following:  Suppose that the probability that a customer arrives in any particular 5 minute interval of time is 0.33.  Suppose  5 minutes have gone by without a customer appearing.  The probability that a customer appears in the next 5 minutes is still 0.33.  (A customer may or may show up in that interval of time, but whether or not a customer shows up then has nothing to do with whether a customer appeared in the previous interval of time.)

The Poisson distribution is related to the Poisson process.  A random variable that takes the values 0, 1, 2, 3, etc. is said to have a Poisson distribution with rate l if the probability that it takes value k is given by the formula e-llk /k!.  (Here k!, read k factorial, is the product of the numbers from 1 up to k.  For example, 5! = 1(2)(3)(4)(5) = 120. Also e=2.71828... is the base of the natural logarithms.)     The arrival times generated by the Poisson process have the property that the number of arrivals in a given time interval of length t is a Poisson distributed random variable.  (To be precise, the probability of k arrivals in an interval of length t is e-lt(l t)k/k!  .Here, lis the reciprocal of the number Mean.)  We aren't interested here in these precise formulas, but we note that this gives us the ability to compute the exact probability of a given number of arrivals in a given interval of time.

The Poisson distribution is a slightly different animal than the other distributions we've met.  It isn't a continuous distribution since only the outcomes 0, 1, 2, 3, etc. are considered.  But it isn't a finite distribution: there are infinitely many outcomes.  We may describe this distribution as being "discrete".
 

Mean, variance and standard deviation

The last probability distribution we wish to consider is the famous "normal" or "gaussian" distribution.  Before we consider it, we need to discuss some simple "descriptive statistics".

Suppose we have some data, a list of numbers.  We might be interested in the average of these numbers.  For example, we could have data on the prices of homes sold in New Albany in 1997, and we might be interested in finding the average price of a home.  In statistics, the mean of a list of numbers is simply their average.  That is, if you wish to take the mean of a list of n numbers, add the numbers and divide by n.

But the mean might not be the best statistic for our purposes.  What if lots of inexpensive homes were sold last year, and one multi-million dollar  mansion?  The mean might be rather high even though most homes were inexpensive.  A better measure here might be a median:  Determine the price level such that half of all the homes sold were lower in price, and half the homes sold were higher in price.

Another concern might be how much of a spread there is in the price of homes.  Imagine a housing market where there are lots of homes close in price to $100,000, but not too many homes much less in price or much higher in price.  Say most of the neighborhoods are standard tract homes of about the same age and size.  Contrast that with another market where there are older neighborhoods with smaller homes and "fixer-uppers", and also with newer neighborhoods with big houses.  That market might have a much greater range of prices listed.

To consider the spread in prices of the homes, we could think as follows:  We could find the mean (average) price, then we could compare the price of each home with the mean.  We could take the difference between each price and the mean.  That is if the average price were $100,000 and a certain house were $115,000 in price, the difference would be $15,000.  We could perhaps compute the average difference.  But consider a house that was $85,000.  The difference there would be -$15,000.  If we average all of the differences, they're likely to cancel out to about zero regardless of how much spread there is in the prices.  We could remedy this by taking absolute values before averaging the differences.  But it turns out to be more convenient mathematically, and more natural, to take the average of the squares of the differences instead.

We offer the following definition:  Let m be the mean of a list of n numbers x, so m = (Sx)/n.  (Here S , the capital Greek letter sigma, stands for sum.)  Then the variation of the list of numbers is s2 = (S(x-m) 2)/n.  The standard deviation is given by s = ((S(x-m)2) /n)1/2.  (Here, n is how many numbers are in your list.)  In statistics, these formulas are sometimes given with n-1 instead of n; they are then referred as the "sample" variation and standard deviation, in symbols s2 and s.

Example:  We can compute the mean, variation and standard deviation of a list of 5 numbers.
 
numbers differences differences squared
6 6 - 6.4 = -0.4 0.16
8 8 - 6.4 = 1.6 2.56
3 3 - 6.4 = -3.4 11.56
11 11 - 6.4 = 4.6 21.16
4 4 - 6.4 = -2.4 5.76
total  32 total 41.2
mean  32 / 5 = 6.4 variance 41.2 / 4 = 10.3
standard deviation 3.209
 

Example:  The following is a list of 100 prices for homes sold in a certain community.
 
99000 90000 110500 106500 113000
112500 123500 130000 82000 106500
88500 93500 105500 109000 91000
70000 116500 122500 103500 83000
121500 103000 104000 106500 117500
89000 103000 99000 121500 112000
100500 104500 103500 87000 106000
106500 90000 97500 69500 92000
122000 99500 110500 109500 126500
81500 108000 106000 88000 94000
99500 83000 89500 100500 121000
99500 101500 107500 102000 95500
64500 116500 83000 79000 83500
122000 130000 110500 76500 117500
92500 123000 115500 92000 77500
114500 107000 91000 103500 85500
100000 112000 90000 69500 81000
104500 109500 117000 100500 98500
91000 113000 84000 98000 117500
82500 85500 111000 98500 101500
We can compute the average (mean) price: it is 100,885.  We can also compute the standard deviation:  it is 14,494.4.  These are computed using the Excel functions AVERAGE and STDEV.  A histogram for these homes is as follows:


 
Example:  Here is a list of house prices from a different community.  The average sales price of 105,980 is similar to the previous community, but this time the standard deviation is 37104.1.  Below see the histogram for this community; notice how much more spread-out the prices are (note the prices on the horizontal axis).
 
50000 109000 46500 82500 105500
107000 68500 96500 156000 146000
106500 76500 118000 91000 44000
107500 53000 140000 94000 120000
101500 17500 174500 182500 133000
115000 105500 113500 131000 145500
49500 142500 79500 105000 108000
62000 132000 124000 78000 119500
67000 151500 73500 89000 94500
99000 55500 59500 108500 118000
93500 173000 95000 142000 147000
138500 108000 157500 93500 56000
78500 81500 194500 140000 108000
129000 83500 128000 70000 101500
110000 195000 117000 90500 159500
121000 95500 45000 79000 177000
102500 101500 60500 80000 165500
166500 73500 104500 94500 103500
126000 150500 112000 89000 94000
98000 145500 92500 48500 28000