Suppose that the random variables X1, X2, . . . , Xn constitute a sample of size n from an infinite population with continuous density. Often it will be useful to reorder these random variables from smallest to largest. In reordering the variables, we will also rename them so that Y1 is a random variable whose value is the smallest of the Xi, Y2 is the next smallest, and so on, with Yn the largest of the Xi. Yr is called the rth order statistic of the sample.
In considering order statistics, it is naturally convenient to know their probability density. We derive an expression for the distribution of the rth order statistic as in [MM].
Theorem 1.1. For a random sample of size n from an infinite population having values x and density f (x), the probability density of the rth order statistic Yr is given by
Proof. Let h be a positive real number. We divide the real line into three intervals: ( , yr), [yr, yr + h], and (yr + h, ). We will first find the probability that Yr falls in the middle of these three intervals, and no other value from the sample falls in this interval. In order for this to be the case, we must have r 1 values falling in the first interval, one value falling in the second and n r falling in the last interval. Using the multinomial distribution, which is explained in Appendix A, the probability of this event, is
We need also consider the case of two or more of the Yi lying in [yr, yr + h]. As this interval has length h, this probability is O(h2) (see Appendix B for a review of big-Oh notation such as O(h2)). Thus we may remove the constraint that exactly one Yi ∈ [yr, yr + h] in (1.2) at a cost of at most O(h2), which yields
We denote the point provided by the mean value theorem by ch,yr in order to emphasize its dependence on h and yr.
We can substitute this result into the expression of (1.3). We divide the result by h (the length of the middle interval [yr, yr + h]), and consider the limit as h → 0:
Thus the proof is reduced to showing that the left hand side above is gr(yr). Let gr(yr) be the probability density of Yr. Let ttr(yr) be the cumulative distribution function of Yr. Thus
Where the last equality follows from the definition of the derivative. This completes the proof.
Remark 1.2. The technique employed in this proof is a common method for calculating probability densities. We first calculate the probability that a random variable Y lies in an infinitesimal interval [y, y + h]. This probability is tt(y + tt(y), where g is the density of Y and tt is the cumulative distribution function (so ttj = g). The definition of the derivative yields
The Sample Distribution of the Median
In addition to the smallest (Y1) and largest (Yn) order statistics, we are often interested in the sample median, X˜. For a sample of odd size, n = 2m + 1, the sample median is defined as Ym+1. If n = 2m is even, the sample median is defined as We will prove a relation between the sample median and the population median µ˜. By definition, µ˜ satisfies
If F is an anti-derivative of f, then the Mean Value Theorem applied to F,
It is convenient to re-write the above in terms of the cumulative distribution function. If F is the cumulative distribution function of f , then F j = f and (2.11) becomes
We are now ready to consider the distribution of the sample median.
Median Theorem. Let a sample of size n = 2m + 1 with n large be taken from an infinite population with a density function f (x˜) that is nonzero at the population median µ˜ and continuously differentiable in a neighborhood of µ˜. The sampling distribution of the median is approximately normal with mean µ˜ and variance
Proof. Let the median random variable X˜ have values x˜ and density g(x˜). The median is simply the (m + 1)th order statistic, so its distribution is given by the result of the previous section. By Theorem 1.1,
We will first find an approximation for the constant factor in this equation. For this, we will use Stirling’s approximation, which tells us that we sketch a proof in Appendix D. We will consider values sufficiently large so that the terms of order 1/n need not be considered. Hence
As F is the cumulative distribution function, which implies
We will need the Taylor series expansion of F (x˜) about µ˜, which is just
F (x˜) = F (µ˜) + F j(µ˜)(x˜ − µ˜) + O((x˜ − µ˜)2).
Because µ˜ is the population median, F (µ˜) = 1/2. Further, since F is the cumulative distribution function, F j = f and we find
This approximation is only useful if x˜-µ˜ is small; in other words, we need limm→∞ x˜-µ˜ = 0. Fortunately this is easy to show, and a proof is included in Appendix C.
Letting t = x˜_ µ˜ (which is small and tends to ), substituting our Taylor series expansion into (2.15) .
By rearranging and combining factors, we find that
Remember that one definition of ex is
see Appendix E for a review of properties of the exponential function. Using this, and ignoring higher powers of t for the moment, we have for large m that
Actually, the argument below is completely wrong! The problem is each term has an error of size O(t2). Thus when we multiply them together there is also an error of size O(t2), and this is the same order of magnitude as the secondary term, (f (µ)t)2. The remedy is to be more careful in expanding F (x˜) and 1 − F (x˜).
A careful analysis shows that their t2 terms are equal in magnitude but opposite in sign. Thus they will cancel in the calculations below. In summary, we really need to use (and similarly for
Since, as shown in Appendix C, x˜ can be assumed arbitrarily close to µ˜ with high probability, we can assume f
Looking at the exponential part of the expression for g(x˜), we see that it appears to be a normal density with mean µ˜ and σ2 = 1/(8mf (µ˜)2). If we were instead to compute the variance from the normalization constant, we would find the variance to be
We see that the two values are asymptotically equivalent, thus we can take the variance to be σ2 = 1/(8mf (µ˜)2). Thus to complete the proof of the theorem, all that we need to is prove that we may ignore the higher powers of t and replace the product with an exponential in passing from (2.19) to (2.21). We have
We use the Taylor series expansion of log(1 − x):
we only need one term in the expansion as t is small. Thus (2.24) becomes
Using the methods of Appendix C one can show that as Thus the exp(O(mt3)) term above tends to 1, which completes the proof.
Our justification of ignoring the higher powers of t and replacing the product with an exponential in passing from (2.19) to (2.21) is a standard technique. Namely, we replace some quantity (1 − P )m with (1 − P )m = exp(m log(1 − P )), Taylor expand the logarithm, and then look at the limit as m → ∞.