The Standard Normal Distribution

The door to statistics

Knowing the normal distribution opens the door to understanding statistics. Knowing a few key numbers associated with the normal distribution helps to build statistical intuition making some numbers are worth remembering !

A random variable follows a normal distribution with mean μ and variance σ ² if the assocated density is:

(2 Π σ ²)^-1/2 e^{-(x-μ)²/2σ²}.

The standard deviation or σ is the square root of the variance.

When μ = 0 and σ = 1 the distribution is called the standard normal distribution.

# Display the normal distribution
xlabels <- seq(-3,3)
ytop = dnorm(0)
x <- seq(-4, 4, length=100)
plot(x, dnorm(x), type="l", lty=1, main="Standard normal distribution",
      xlab="Standard deviation",ylab="Cummulative density", ylim=c(0,0.45),
      xaxt='n',yaxt='n')
  points(0,ytop,type="p",col="black",pch = 16) #dot
  points(0,ytop,type="h",col="black") #line
  axis(1, at=xlabels, xlabels)
  text(0,ytop+0.015,labels = "50%")

With tools like R to do the number crunching we don’t need to remember the normal distribution’s fomula, but remembering some numbers associated with the standard deviation does make using the normal distributions a powerfull tool. This article gives a summary of the key values associated with the standard deviation.

What does the standard deviation indicate ?

The standard deviation (or σ) is the key indicator when we determine if a value would form part of the ditribution or not - falling with in the density distribution or not.

Using the R normal distribution fucntions pnorm, dnomr on qnorm we can plot the key density measures mapping to one or more standard deviations onto the normal curve.

# labelling the cumulative density 
# knowing the standard deviation (-3 to 3) and probability
for (i in -3:3){
  a=i #x-value
  b=pnorm(i) #label y-value - cumulative density
  hx=dnorm(i) #points y-value
  points(a,hx,type="p",col="red",pch = 16) #dot
  points(a,hx,type="h",col="red") #line
  text(a,hx+0.015,labels = paste(round(b*100,digits=2),"%"),col="red")
}

What lies between the standard deviations?

Approximately 68%, 95% and 99% of the normal density lies within 1, 2 and 3 standard deviations from the mean.

#What lies between the standard deviations?
for (i in c(1,2,3)){
  segments(x0 = -i,y0 = dnorm(i), x1=i, col = "black")
  text(0,dnorm(i)+0.015,labels= paste(round((pnorm(i)-pnorm(-i))*100,digits=2),"%"))
}

More facts on normal density

-1.28, -1.645, -1.96 and -2.33 are the 10^th, 5^th, 2.5^th and 1^st percentiles of the standard normal distribution.

Symmetrical to these, 1.28, 1.645, 1.96 and 2.33 are the 90^th, 95^th, 97.5^th and 99^th percentiles of the standard normal distribution.

# labelling the quantile knowing the density and probability
 series = c(0.99,0.975,0.95,0.9,0.1,0.05,0.025,0.01)
for (i in series){
  a=i #y-value
  b=qnorm(a) #label x-value - quantile
  axis(1, at=b, round(b,digits=2))
  hx=dnorm(b) #points x-value
  points(b,hx,type="p",col="blue",pch = 16) #dot
  points(b,hx,type="h",col="blue") #line
  text(b,hx+0.015,labels = paste(round(a*100,digits=2),"%"),col="blue")
}

Keeping it together

Using these measures as indicators is a great help to an initial ‘gut feel’ when evaluating the probabilities within a normal distribution.