MTH6P2A: The Sample Distribution of the Median

Order Statistics

Suppose that the random variables X1, X2, . . . , Xn constitute a sample of size n from an infinite population with continuous density. Often it will be useful to reorder these random variables from smallest to largest. In reordering the variables, we will also rename them so that Y1 is a random variable whose value is the smallest of the Xi, Y2 is the next smallest, and so on, with Yn the largest of the Xi. Yr is called the rth order statistic of the sample.

In considering order statistics, it is naturally convenient to know their probability density. We derive an expression for the distribution of the rth order statistic as in [MM].

Theorem 1.1. For a random sample of size n from an infinite population having values x and density f (x), the probability density of the rth order statistic Yr is given by


Proof.  Let h be a positive real number. We divide the real line into three intervals: (  , yr), [yr, yr + h], and (yr + h,  ). We will first find the probability that Yr falls in the middle of these three intervals, and no other value from the sample falls in this interval. In order for this to be the case, we must have r 1 values falling in the first interval, one value falling in the second and n r falling in the last interval.  Using the multinomial distribution, which is explained in Appendix A, the probability of this event, is

We need also consider the case of two or more of the Yi lying in [yr, yr + h]. As this interval has length h, this probability is O(h2) (see Appendix B for a review of big-Oh notation such as O(h2)). Thus we may remove the constraint that exactly one Yi ∈ [yr, yr + h] in (1.2) at a cost of at most O(h2), which yields

We denote the point provided by the mean value theorem by ch,yr   in order to emphasize its dependence on h and yr.

We can substitute this result into the expression of (1.3). We divide the result by h (the length of the middle interval [yr, yr + h]), and consider the limit as h → 0:

Thus the proof is reduced to showing that the left hand side above is gr(yr). Let gr(yr) be the probability density of Yr. Let ttr(yr) be the cumulative distribution function of Yr. Thus

Where the last equality follows from the definition of the derivative. This completes the proof.

Remark 1.2. The technique employed in this proof is a common method for calculating probability densities. We first calculate  the  probability  that  a  random  variable  Y  lies  in  an  infinitesimal  interval  [y, y + h].   This  probability  is  tt(y + tt(y), where  g is  the  density  of  Y  and  tt is  the  cumulative  distribution  function  (so  ttj = g).  The  definition  of  the derivative  yields

The Sample Distribution of the Median

In addition to the smallest (Y1) and largest (Yn) order statistics, we are often interested in the sample median, X˜.  For a sample of odd size, n = 2m + 1, the sample median is defined as Ym+1. If n = 2m is even, the sample median is defined as   We will prove a relation between the sample median and the population  median µ˜.  By definition, µ˜ satisfies

If F is an anti-derivative of f, then the Mean Value Theorem applied to F,

It is convenient to re-write the above in terms of the cumulative distribution function.  If F  is the cumulative distribution function of f , then F j = f and (2.11) becomes

We are now ready to consider the distribution of the sample median.

Median Theorem. Let a sample of size n = 2m + 1 with n large be taken from an infinite population with a density function  f (x˜)  that  is  nonzero  at  the  population  median  µ˜  and  continuously  differentiable  in  a  neighborhood  of  µ˜.   The  sampling  distribution  of  the  median  is  approximately  normal  with  mean  µ˜  and  variance

Proof.  Let  the  median  random  variable  X˜ have  values  x˜  and  density  g(x˜).   The  median  is  simply  the  (m + 1)th  order statistic, so its distribution is given by the result of the previous section. By Theorem 1.1,

We will first find an approximation for the constant factor in this equation.  For this, we will use Stirling’s approximation, which tells us that  we sketch a proof in Appendix D. We will consider values sufficiently large so that the terms of order 1/n need not be considered. Hence

As F is the cumulative distribution function, which implies

We will need the Taylor series expansion of F (x˜) about µ˜, which is just

F (x˜)   =   F (µ˜) + F j(µ˜)(x˜ − µ˜) + O((x˜ − µ˜)2).

Because  µ˜  is  the  population  median,  F (µ˜) = 1/2.  Further,  since  F  is  the  cumulative  distribution function, F j = f  and we find

This approximation is only useful if x˜-µ˜ is small; in other words, we need limm→∞  x˜-µ˜  = 0.  Fortunately this is easy to show, and a proof is included in Appendix C.

Letting  t =  x˜_ µ˜  (which  is  small  and  tends  to ),  substituting  our  Taylor  series  expansion  into  (2.15) .

By rearranging and combining factors, we find that

Remember that one definition of ex is

see Appendix E for a review of properties of the exponential function. Using this, and ignoring higher powers of t for the moment, we have for large m that

Actually, the argument below is completely wrong! The problem is each term has an error of size O(t2). Thus when we multiply them together there is also an error of size O(t2), and this is the same order of magnitude as the secondary term, (f (µ)t)2. The remedy is to be more careful in expanding F (x˜) and 1 − F (x˜).

A careful analysis shows that their t2 terms are equal in magnitude but opposite in sign.  Thus they will cancel in the calculations below.  In summary, we really need to use   (and similarly for

Since, as shown in Appendix C, x˜ can be assumed arbitrarily close to µ˜ with high probability, we can assume f

Looking  at  the  exponential  part  of  the  expression  for  g(x˜),  we  see  that  it  appears  to  be  a  normal density with  mean µ˜ and σ2 = 1/(8mf (µ˜)2).  If we were instead to compute the variance from the normalization constant, we would find the variance to be

We see that the two values are asymptotically equivalent, thus we can take the variance to be σ2 = 1/(8mf (µ˜)2).  Thus to complete the proof of the theorem, all that we need to is prove that we may ignore the higher powers of t and replace the product with an exponential in passing from (2.19) to (2.21). We have

We use the Taylor series expansion of log(1 − x):

we only need one term in the expansion as t is small. Thus (2.24) becomes

Using the methods of Appendix C one can show that as  Thus the exp(O(mt3)) term above tends to 1, which completes the proof.

Our justification of ignoring the higher powers of t and replacing the product with an exponential in passing from (2.19) to (2.21) is a standard technique. Namely, we replace some quantity (1 − P )m with (1 − P )m = exp(m log(1 − P )), Taylor expand the logarithm, and then look at the limit as m → ∞.

ASSIGNMENT : The Sample Distribution of the Median Assignment MARKS : 10  DURATION : 24 hours


Welcome to FAWE

STEM Elearning

We at FAWE have built this platform to aid learners, trainers and mentors get practical help with content, an interactive platform and tools to power their teaching and learning of STEM subjects, more

How to find your voice as a woman in Africa

© FAWE, Powered by: Yaaka DN.