Search
Girls in STEM
About FAWE Elearning
We at FAWE have built this platform to aid learners, trainers and mentors get practical help with content, an interactive platform and tools to power their teaching and learning of STEM subjects
Search
More Girls in STEM
Histoires inspirantes de femmes scientifiques africainesUncategorized
PRODUCTION OF ARTIFICIAL COLOSTRUM TO REDUCE CALF MORTALITY AND INCREASE THEIR PERFORMANCEUncategorized
- Inspiring stories from African women scientistsUncategorized
Addressing gender stereotypes in the classroomGeneral
Gender assumptions that challenge a quality education for girls in UgandaEducation
Strengthening Gender Responsive Pedagogy for STEM in UgandaEducation
RESPECT FOR WOMEN IS PARAMOUNTNetworking
In 1733, Abraham de Moivre presented an approximation to the Binomial distribution. He later (de Moivre, 1756, page 242) appended the derivation of his approximation to the solution of a problem asking for the calculation of an expected value for a particular game. He posed the rhetorical question of how we might show that experimental proportions should be close to their expected values:
From this it follows, that if after taking a great number of Experi-ments, it should be perceived that the happenings and failings have been nearly in a certain proportion, such as of 2 to 1, it may safely be concluded that the Probabilities of happening or failing at any one time assigned will be very near in that proportion, and that the greater the number of Experiments has been, so much nearer the Truth will the conjectures be that are derived from them.
But suppose it should be said, that notwithstanding the reason-ableness of building Conjectures upon Observations, still considering the great Power of Chance, Events might at long run fall out in a different proportion from the real Bent which they have to happen one way or the other; and that supposing for Instance that an Event might as easily happen as not happen, whether after three thousand Experiments it may not be possible it should have happened two thou-sand times and failed a thousand; and that therefore the Odds against so great a variation from Equality should be assigned, whereby the Mind would be the better disposed in the Conclusions derived from
the Experiments.
In answer to this, I’ll take the liberty to say, that this is the hardest Problem that can be proposed on the Subject of Chance, for which reason I have reserved it for the last, but I hope to be forgiven if my Solution is not tted to the capacity of all Readers; however I shall derive from it some Conclusions that may be of use to every body: in order thereto, I shall here translate a Paper of mine which was printed November 12, 1733, and communicated to some Friends, but never yet made public, reserving to myself the right of enlarging my own Thoughts, as occasion shall require.
De Moivre then stated and proved what is now known as the normal approximation to the Binomial distribution. The approximation itself has subsequently been generalized to give normal approximations for many other distributions. Nevertheless, de Moivre’s elegant method of proof is still worth understanding. This Chapter will explain de Moivre’s approximation, using modern notation.
A Method of approximating the Sum of the Terms of the Binomial
expanded into a Series, from whence are deduced some practical Rules to estimate the Degree of Assent which is to be given to Experiments.
Altho’ the Solution of problems of Chance often requires that several Terms of the Binomial
be added together, nevertheless in very high Powers the thing appears so laborious, and of so great difficulty, that few people have undertaken that Task; for besides James and Nicolas Bernouilli, two great Mathematicians, I know of no body that has attempted it; in which, tho’ they have shown very great skill, and have the praise that is due to their Industry, yet some things were further required; for what they have done is not so much an Approximation as the determining very wide limits, within which they demonstrated that the Sum of the Terms was contained. Now the method . . .
Pictures of the binomial
Suppose
has a Bin(n; p) distribution. That is,
Recall that we can think of
as a sum of independent random variables
From this representation it follows that
Recall also that Tchebychev’s inequality suggests the distribution should be clustered around np, with a spread determined by the standard deviation,
What does the Binomial distribution look like? The plots in the next display, for the Bin(n; 0:4) distribution with n = 20; 50; 100; 150; 200, are typical. Each plot on the left shows bars of height
and width 1, centered at k. The maxima occur near nx0:4 for each plot. As n increases,the spread also increases, re ecting the increase in the standard deviations
Each of the shaded regions on the left has area to one because
for each n.
The plots on the right show represent the distributions of the standardized random variables
The location and scaling effects of the increasing expected values and standard deviations (with p = 0:4 and various n) are now removed. Each plot is shifted to bring the location of the maximum close to 0 and the horizontal scale is multiplied by a factor 
A bar of height
with width
is now centered at
The plots all have similar shapes. Each shaded region still has area 1.
De Moivre’s argument
Notice how the standardized plots in the last picture settle down to a symmetric `bell-shaped’ curve. You can understand this effect by looking at the ratio of successive terms:
As a consequence,
if and only if
that is,
For xed n, the probability
achieves its largest value at
The probabilities
increase with k for
then decrease for
That explains why each plot on the left has a peak near np.
Now for the shape. At least for
near
we get a good approximation for the logarithm of the ratio of successive terms using the Taylor approximation:
for x near 0. Indeed,
By taking a product of successive ratios we get the ratio of the individual Binomial probabilities to their largest term. On a log scale the calculation.
The largest binomial probability
Using the fact that the probabilities sum to 1, for p = 1=2 de Moivre was able to show that the
should decrease like
for a constant B that he was initially only able to express as an innite sum. Referring to his calculation of the ratio of the maximum term in the expansion of
to the sum,
he wrote (de Moivre, 1756, page 244)
When I first began that inquiry, I contented myself to determine at large the Value of B, which was done by the addition of some Terms of the above-written Series; but as I perceived that it converged but slowly, and seeing at the same time that what I had done answered my purpose tolerably well, I desisted from proceeding further till my worthy and learned Friend Mr. James Stirling, who had applied himself after me to that inquiry, found that the Quantity B did denote the Square-root of the Circumference of a Circle whose Radius is Unity, so that if that
Circumference be called c, the Ratio of the middle Term to the Sum of all the Terms will be expressed by
In modern notation, the vital fact discovered by the learned Mr. James Stirling asserts that
in the sense that the ratio of both sides tends to 1 (very rapidly) as n goes to innity. See Feller (1968, pp52-53) for an elegant, modern derivation of the Stirling formula.
By Stirling’s formula, for
Normal approximations
How does one actually perform a normal approximation? Back in the golden days, I would have interpolated from a table of values for the function.
which was found in most statistics texts. For example, if X has a Bin(100; 1=2) distribution,
These days, I would just calculate in R:
> pnorm(55.5, mean = 50, sd = 5) – pnorm(44.5, mean = 50, sd = 5) [1] 0.7286679
or use another very accurate, built-in approximation:
> pbinom(55,size = 100, prob = 0.5) – pbinom(44,size = 100, prob = 0.5) [1] 0.728747
Continuous distributions
At this point, the integral in the denition of
(x) is merely a re ection of the Calculus trick of approximating a sum by an integral. Probabilists have taken a leap into abstraction by regarding
, or its derivative
(y) := 
as a way to dene a probability distribution.
Denition. A random variable Y is said to have a continuous distribution (on R) with density function f(R) if
Notice that f should be a nonnegative function, for otherwise it might get awkward when calculating
That is, the integral of a density function over the whole real line equals one.
with the zero density killing off unwanted
I prefer to think of densities as being dened on the whole real line, with values outside the range of the random variable being handled by setting the density function equal to zero in appropriate places. If a range of integration is not indicated explicitly, it can then always be understood as
contributions.
Distributions dened by densities have both similarities to and differences from the sort of distributions I have been considering up to this point in Stat 241/541. All the distributions before now were discrete. They were
described by a (countable) discrete set of possible values fxi : i = 1; 2; : : : g that could be taken by a random variable X and the probabilities with which X took those values:
Expectations, variances, and things like Eg(X) for various functions g, could all be calculated by conditioning on the possible values for X.
For a random variable X with a continuous distribution dened by a density f, we have
for every
We cannot hope to calculate a probability by adding up (an uncountable set of) zeros. Instead, as you will see in Chapter 7, we must pass to a limit and replace sums by integrals when a random variable X has
a continuous distribution.
The
appeared in de Moivre’s approximation by way of Stirling’s formula. It is slightly mysterious why it appears in that formula. The reason for both appearances is the fact that the constant.
is exactly equal to
as I now explain.