# MTH6P2A: The Normal Distribution

The Normal Distribution

introduction

An introduction to the Normal Distribution is given in Dispersion Part 3.

Here the concepts of variance,  standard deviation and their use for grouped data are introduced. The topics of  skewness and  outliers  are also touched upon.

The Normal Distribution A normal (or Gaussian) distribution is a symmetrical curve, with a central maximum.
The mean, mode and median occur at one point along the x-axis, corresponding to the central maximum.

Horizontal axis – continuous random variable    x.

Vertical axis – probability density function of x    f(x)

This is also known as  density functionPDF or pdf.

properties:

i)  the graph of the density function is a continuous curve

ii) the area bounded by the curve and the x-axis = 1 (over the total continuous range of values( i.e. the variable’s domain)

iii) probability of a random variable x  in the range a < x <  b is equal to the area under the probability density function curve bounded by a and b

The equation of the curve has the general form: where,

i)   a and b are constants
ii)  the vertical line x = is the axis of symmetry of the curve
iii) f(x) > 0 for all values of x

The basic equation can be expanded as: Standard Deviation and probability

The area enclosed by the curve and discrete values is a measure of probability. Approx. 68% of the values are within 1 standard deviation of the mean. Approx. 95% of the values are within 2 standard deviations of the mean. Approx. 99.7% of the values are within 3 standard deviations of the mean.

Summary Example

68% of students score between 54% and 72% on their mathematics paper.

i) Assuming normally distributed data, give an approximate answer for the mean and standard deviation of the scores.

ii) using the results from part i), what is the range of scores obtained by 95% of the students?

i) mean =(72 + 54)/2 = 63%

68% is 1 standard deviation either side of the mean.

So 68% represents a total of 2 standard deviations.

1 standard deviation = (72 – 54)/2 = 9%

ii) 95% of the students is 2 standard deviations either side of the mean. That is 18% (2 x 9%) either side of 63%.

So the range of scores is:

63 -18 and 63 +18

45%  –  81%

Standardizing – the standardized normal probability function. Standardizing is the conversion of a normal distribution into a more useful form where :

the curve is symmetrical about the line z = 0 (the mean μ = 0)

the area below the curve and the z-axis is ‘1’

the units of z are ‘standard deviations’    σ *

*z is also called ‘the standard score‘ , ‘sigma‘ , z-score

The f(x) function is transformed into the Φ(z) function.

Φ(z) is called the standardized normal probability function. This has a particular value for any value of z (this is what we look up in z-tables).

The normal probability density function f(x) equation , is transformed. The standard deviation σ becomes ‘1’, while the mean μ becomes equal to zero. The value of z is calculated from the formula: Example

A student gains a score of 57% in a test.

i) If the mean result is 47% and the standard deviation 20% , calculate the z-score for the student. ii) Using the table, estimate what % of students scored lower than 47%. *on one side of the mean

Between the score 57% and the mean 47% represents 0.5 of a standard deviation(calculated in (i ).

According to the table this represents 19.1% of the scores.

Between the score 0% and the mean 47% represents standard deviations.

This is half the total area under the curve (i.e. 50% of the scores).

So adding together these results: 19.1% + 50%.

The total % of students with scores less than 57% is 69.1%.

iii) Sketch a normal distribution curve illustrating the problem. z-tables

z-tables give the area under the f(z) graph between minus infinity and a particular value of z. This area is called the cumulative probability function or ‘phi of z’ , written Φ(z).
Mathematically this is expressed as: This is part of the table used by the Edexcel Exam board, UK. Readings of z are incremental by 0.01, from 0.00 to 4.00 .

The cumulative probability function Φ(z) ranges from 0.5000to 1.0000 .

Only positive values of z are given. Using the symmetry of the curve, negative values can easily be inferred (see below).

Nomenclature for Normal Distributions – N(μ,σ2) , N(0,1)

This is simply a short-hand way of describing a normal distribution.

N(μ,σ2)

μ mean, σ2 variance

So a standardized normal distribution with mean (μ) = 0 and variation (σ2) = 1 is written:

N(0,1)

Other distributions give flatter or sharper ‘bell curves’ depending on their value for σ2 .

N(0, 0.5) is a sharp curve(less range)

N(0, 2.0) is a shallow curve(wide range) Cumulative Distribution Function(CDF)   P(z)

This form of the CDF is the area under the bell curve to one side of a typical value of z . The area gives a value for the probability of z in the stated range .  More on z-tables

From z-tables the area under the curve of f(z) can be determined. z is read from the extreme left(- ∞) up to any positive value of z. This area Φ(z), is called the cumulative distribution function.

Hence when z = 0 the area is 0.5 . Note that the total area under the curve is 1.

If we want to measure the particular area(and hence cumulative probability) between discrete values we use a different form of the function: P(Z<z)

The case of P(Z<z)

So to evaluate P(Z<z) all we have to do is read off the value of Φ(z) for z from the tables.

Since in this case,

Φ(z) = P(Z<z)

Example

i) For a Standardized Normal Distribution N(0,1), evaluate the Cumulative Distribution Function(CDF) for the condition where z<1.9  The case of P(Z>z)

(area under the curve to the right of any value z) =

(area under whole curve) – (area under curve up to value z) Example

i) For a Standardized Normal Distribution N(0,1), evaluate the Cumulative Distribution Function(CDF) for the condition where z>1.9 The case of P(Z>-z)

By symmetry,

(area under the curve to the left of a positive value of z) =

(area under the curve to the right of a negative value of z) Example

i) For a Standardized Normal Distribution N(0,1), evaluate the Cumulative Distribution Function(CDF) for the condition where z>-1.9 The case of P(Z<-z)

By symmetry,

(area under the curve to the left of a negative value of z) =

(area under the curve to the right of a positive value of z) Example

i) For a Standardized Normal Distribution N(0,1), evaluate the Cumulative Distribution Function(CDF) for the condition where z<-1.9 ## STEM Elearning

We at FAWE have built this platform to aid learners, trainers and mentors get practical help with content, an interactive platform and tools to power their teaching and learning of STEM subjects, more