Elementary Probability Theory

History:  The word 'probable' originally meant 'worthy of belief' or 'worthy of approval'.  It could be used in a sentence such as "it was probable but definitely false", meaning, "it was worthy of belief but definitely false."  The modern sense of the word involving numerical values for likelihoods came later.

In 1525, Gerolamo Cardan wrote Liber de Ludo Aleae (Book on Games of Chance), in which he outlined the basic theory of probability as applied to dice games and gambling.  However, the book was not published.  Cardan was one of the most colorful figures in the history of mathematics, with a reputation for being a scoundrel.

The French mathematicians Fermat and Pascal worked out the theory of probability in the mid 1650s, in response to the questions of Chevalier de Mere, a veteran gambler.

Dice as we know them date to at least 3000 BC.  The current arrangement of spots dates to about 1400 BC.  Before people used dice, they used astrogali bones from the heels of cows.  These had four sides, so they produced four different possible outcomes.  Playing cards are much more recent; they appeared in Europe in the fourteenth century.  They were hand-painted and expensive.  Gutenberg, who perfected printing with movable type, produced the first printed playing cards.  The current configuration of a deck dates to the 1500's.

An interesting question of mathematics history is why the ancient Greeks didn't develop any probability theory.  Their mathematics was highly sophisticated (as exemplified by Euclid's Elements).  Apparently, they viewed chance or random events as completely capricious, the whim of the gods, so it never occurred to them to think of these things quantitatively.
 

Empirical probability (relative frequency)

The simplest way of thinking of the probability of an event is to imagine repeating an experiment many times to see how many times the desired outcome occurs.

A simple experiment might be to roll two dice many times, to see how many times a sum of 5 is obtained.

Another experiment would be to toss three coins.  We might try to find the probability of getting exactly 2 heads, by tossing the three coins many times.  It might turn out that about 37% of the times the coins are tossed, exactly 2 heads appear.  So the fraction of the outcomes that are as desired is 0.37.  This is the relative frequency of the outcomes where 2 heads appear.  We say the probability is 0.37.

Another example:  Suppose 400 parts in a production run of 10,000 is defective.  So 4% of the parts are defective.  If parts are selected at random, one is likely to find the probability of getting a defective one is 0.04.   In statistics, a common problem would be to estimate how many of the 10,000 parts are defective, by randomly selecting parts and seeing how many parts from the "sample" are defective.
 

Classical probability

Often, we can analyze an experiment to determine exactly what outcomes are possible, and to obtain exact values for the probabilities.

Here is some terminology.  Consider an experiment such as rolling two dice.  The sample space is the set (collection) of all possible outcomes of the experiment.  An event is some subset of the sample space, that is, an event is a set of outcomes.  The probability of the event is given by:
 

P(event) = 
 
number of outcomes in the event
number of all possible outcomes 
 
(This assumes all outcomes are equally likely.)

Example:  We consider the experiment of rolling two dice, and the event that we get a sum of 5.  The following table gives the outcomes in the sample space.  We imagine one die being red and the other being blue, to distinguish possible outcomes.
11 12 13 14 15 16
21 22 23 24 25 26
31 32 33 34 35 36
41 42 43 44 45 46
51 52 53 54 55 56
61 62 63 64 65 66
The outcomes that give a sum of 5 are highlighted.  We see that the probability of that event is 4/36 = 0.111.  So if two dice are rolled, about 11% of the time we get a sum of 5.

Example:  We consider the experiment of tossing three coins.  The sample space is:
HHH HHT HTH HTT
THH THT TTH TTT
Consider the event of getting exactly two heads.  That event consists of the three shaded outcomes so the probability of that event is 3/8 = 0.375.

Example:   In the 1860s, Gregor Mendel determined the elementary laws of genetics by carefully crossing certain varieties of peas.   Here we consider the genetics of eye color.  Eye color is determined by a single pair of genes, one inherited from the father and one from the mother.  Each gene can come in several varieties (known as "alleles").  We will call the gene for brown eyes B and the gene for blue eyes b.  Now the gene for brown eyes is dominant over the gene for blue eyes.  This means the combinations BB, Bb and bB all result in brown eyes, and only bb produces blue eyes.  (These combinations are known as "genotypes".  The observed characteristic of blue eyes or brown eyes is known as a "phenotype".)  Here is a simple probability problem:  Suppose both parents have genotype Bb.  (So they are both brown eyed.)  What is the probability that a child of theirs will have blue eyes?  The sample space has four outcomes (shaded): 
Mother
Father
B b
B BB Bb
b bB bb
 
Only one of those outcomes results in blue eyes.  So the probability of that outcome is 1/4 = 0.25.  Modern statistical theory was in large part developed (in the early to mid 20th century) to solve difficult problems in genetics.
 

Odds.

Often probabilities are expressed as odds.  Suppose you play a game of chance with an opponent, where you have a 25% chance of winning (so your opponent has a 75% chance of winning).  Then for every game you win, your opponent will probably win three games.  You can say that your opponent has odds of 3 to 1 in his or her favor.   In the example above on genetics, the odds in favor of brown eyes are 3 to 1.  (In his famous experiments with peas, Mendel often observed the characteristic 3 to 1 ratio in the offspring.)  If a long-shot at the racing track is given odds of 20 to 1 [against it], this means the horse is likely to win only 1 out of every 21 races, which is a probability of 0.048 or 4.8%.  So you shouldn't bet more than $1 for the horse for every $20 bet against it. (However, the actual payoff will be smaller if the horse does win, since the track will collect some percentage of the betting pool.)

In general, if the odds in favor of an event are said to be a to b then the probability of that event is a/(a+b).
 

Complementary events.

The complementary event of an event A is the event Ac that A does not occur.  For example, if we draw a card from a standard 52 card deck, and if A is the event that the card is a heart, then A is the event that the card drawn is not a heart.  Now there are 13 hearts in the deck, and there are 39 cards that are not a heart.  So the probability of A is P(A) = 13/52 = 1/4.  And the probability of A is P(Ac) = 39/52 = 3/4.  Thus P(Ac) = 1 - P(A), which is true in general for the probability of a complementary event.