MTH6P2A: The Multinomial Distribution

We can use a binomial distribution to study a situation in which we have multiple trials with two possible outcomes with the probabilities of each respective outcome the same for each trial and all of the trials independent.

A generalization of the binomial distribution is the multinomial distribution. Like the binomial distribution, the multinomial distribution considers multiple independent trials with the probabilities of respective outcomes the same for each trial.

However, the multinomial distribution gives the probability of different outcomes when we have more than two possible outcomes for each trial. This is useful, for example, in proving the distribution of order statistics, where we take the different trials to be the sample data and the outcomes to be the three intervals in the real line in which these data can fall.

Suppose that we  have  n trials and k mutually exclusive outcomes with probabilities θ1, θ2, . . . θk.  We  will let       f (x1, x2, . . . , xk) be the probability of having xi outcomes of each corresponding type, for 1 ≤ i k. Obviously, we must have x1 + x2 + · · · + xk = n. To compute f (x1, x2, . . . , xk), we first note that the probability of getting these numbers of outcomes in some particular order is  We now compute the number of orders in which our combination of numbers of outcomes is attainable. The x1 outcomes of the first type can be chosen in ways, the outcomes of the second type can be chosen in ways, and so on up to the outcomes of type k which can be chosen in ways. The total number of orderings is therefore

The product telescopes and we are left with

The expression (A.30) is called a multinomial coefficient and is often denoted

Using the multinomial coefficient, we can see that

This is the multinomial distribution. We often write  to emphasize the dependence on the parameters.

One can derive the multinomial distribution by repeated uses of the binomial theorem.  For example, if k = 3 there are three outcomes, say A, B and C. We may amalgamate B and C and consider the case of two outcomes:  A and not A. If we let θ1 equal the probability of A and 1 − θ1 the probability of not A, we find the probability of x1 Outcomes being A and n x1 outcomes being not A is just 

Let θ2 be the probability of outcome B, and θ3 the probability of outcome C. Given A does not occur, the probability that B occurs is the probability that C occurs is  Thus the probability that x1 outcomes are A, x2 are  B and are C is 

which agrees with what we found above.

B. Big-Oh Notation

Deftnition B.1 (Big-Oh Notation). A(x) = O(B(x)), read “A(x) is of order (or big-Oh) B(x)”, means there is a C > 0 and an x0 such that for all x x0, |A(x)| ≤ C B(x). This is also written A(x) ¿ B(x) or B(x) À A(x).

Big-Oh notation is a convenient way to handle lower order terms. For example, if we write F (x) = x5 + O(x2), this means that as x tends to infinity, the main term of F (x) grows like x5, and the correction (or error) terms are at most some constant times x2.

C. Proof  That  With  High  Probability  |X˜ − µ˜| is  Small

In  proving  the  Median  Theorem,  we  assume  that  we  can  ignore  higher  powers  of  t =  X˜-µ˜.   We  are  able  to  do  this because, with high probability, t is small. Here we provide a more formal statement of this fact, as well as a proof.

Lemma C.1.  Suppose  f (x) is  a  continuously  differentiable  function  in  some  neighborhood  of  µ˜.  Then  for  any  c > 0,  we have 

Proof. This is equivalent to proving that

We will prove only the first of these two statements as the proof of the second is very similar. By (2.15), we can approximate the density of the median as

We consider the factor ([F (x˜)] [1-F (x˜)])m.  It is convenient to write θ = F (x˜) and consider the function h(θ) = θ(1-θ). This function will attain its maximum for the same value of θ = F (x˜) as ([F (x˜)] [1      F (x˜)])m, and it is a simple exercise in calculus to show that this value is This condition holds only for x˜ = µ˜.  We furthermore note that for hj(θ) = 1 − 2θ > 0, so h is increasing. Since precisely when x˜ = µ˜, this means that for x˜ ≤ µ˜ − c value of g(θ) occurs for x˜ = µ˜ − c.  We therefore have for x˜ ≤ µ˜ − c,

In particular, we note that α < 1.

We now begin to look at the probability that X˜  is at most µ˜ − c.  We have

In the last step, we use the fact that for m su±ciently large (m > 1, in fact), 2m This simpli¯es the expression as a factor of 2m is easier to work with than the factor of 2m + 1. We now apply our bound on F (x˜)(1-F (x˜)) to find that

In obtaining the rightmost expression, we have used the fact that f is nonnegative everywhere and positive in a neighborhood of µ˜, so that Since µ˜ is the median of the population, by definition, we have

Since α < 1, it follows that the right side of this inequality must converge to 0 as m goes to infinity, so the probability on the right side must likewise converge to 0.

D. Stirling’s Approximation Formula for n!

Exact computations involving factorials of large numbers can be very difficult. Fortunately, there is an approximation formula which can greatly simplify the computations.

Stirling’s Formula.

Proof. For a proof, see [WW]. We show (D.46) is a reasonable approximation. It is often easier to analyze a product by converting it to a sum; this is readily accomplished by taking logarithms. We have

Thus log n! ≈ n log n n, or n! ≈ nnen.

E. Review of the exponential function

These notes in this section are taken from [MT-B].

In this section we study some of the basic properties of the number e (see [Rud] for more properties and proofs). One of the many ways to define the number e, the base of the natural logarithm, is to write it as the sum of the following infinite series:

Hence e is the limit of the convergent sequence sm. This representation is one of the main tool in analyzing the nature of e.


Welcome to FAWE

STEM Elearning

We at FAWE have built this platform to aid learners, trainers and mentors get practical help with content, an interactive platform and tools to power their teaching and learning of STEM subjects, more

How to find your voice as a woman in Africa

© FAWE, Powered by: Yaaka DN.