The door to statistics

Knowing the normal distribution opens the door to understanding statistics. Knowing a few key numbers associated with the normal distribution helps to build statistical intuition making some numbers are worth remembering !

A random variable follows a normal distribution with mean μ and variance σ 2 if the assocated density is:

(2 Π σ 2)-1/2 e-(x-μ)2/2σ2.

The standard deviation or σ is the square root of the variance.

When μ = 0 and σ = 1 the distribution is called the standard normal distribution.

# Display the normal distribution
xlabels <- seq(-3,3)
ytop = dnorm(0)
x <- seq(-4, 4, length=100)
plot(x, dnorm(x), type="l", lty=1, main="Standard normal distribution",
      xlab="Standard deviation",ylab="Cummulative density", ylim=c(0,0.45),
      xaxt='n',yaxt='n')
  points(0,ytop,type="p",col="black",pch = 16) #dot
  points(0,ytop,type="h",col="black") #line
  axis(1, at=xlabels, xlabels)
  text(0,ytop+0.015,labels = "50%")

With tools like R to do the number crunching we don’t need to remember the normal distribution’s fomula, but remembering some numbers associated with the standard deviation does make using the normal distributions a powerfull tool. This article gives a summary of the key values associated with the standard deviation.

What does the standard deviation indicate ?

The standard deviation (or σ) is the key indicator when we determine if a value would form part of the ditribution or not - falling with in the density distribution or not.

Using the R normal distribution fucntions pnorm, dnomr on qnorm we can plot the key density measures mapping to one or more standard deviations onto the normal curve.

# labelling the cumulative density 
# knowing the standard deviation (-3 to 3) and probability
for (i in -3:3){
  a=i #x-value
  b=pnorm(i) #label y-value - cumulative density
  hx=dnorm(i) #points y-value
  points(a,hx,type="p",col="red",pch = 16) #dot
  points(a,hx,type="h",col="red") #line
  text(a,hx+0.015,labels = paste(round(b*100,digits=2),"%"),col="red")
}

What lies between the standard deviations?

Approximately 68%, 95% and 99% of the normal density lies within 1, 2 and 3 standard deviations from the mean.

#What lies between the standard deviations?
for (i in c(1,2,3)){
  segments(x0 = -i,y0 = dnorm(i), x1=i, col = "black")
  text(0,dnorm(i)+0.015,labels= paste(round((pnorm(i)-pnorm(-i))*100,digits=2),"%"))
}

More facts on normal density

-1.28, -1.645, -1.96 and -2.33 are the 10th, 5th, 2.5th and 1st percentiles of the standard normal distribution.

Symmetrical to these, 1.28, 1.645, 1.96 and 2.33 are the 90th, 95th, 97.5th and 99th percentiles of the standard normal distribution.

# labelling the quantile knowing the density and probability
 series = c(0.99,0.975,0.95,0.9,0.1,0.05,0.025,0.01)
for (i in series){
  a=i #y-value
  b=qnorm(a) #label x-value - quantile
  axis(1, at=b, round(b,digits=2))
  hx=dnorm(b) #points x-value
  points(b,hx,type="p",col="blue",pch = 16) #dot
  points(b,hx,type="h",col="blue") #line
  text(b,hx+0.015,labels = paste(round(a*100,digits=2),"%"),col="blue")
}

Keeping it together

Using these measures as indicators is a great help to an initial ‘gut feel’ when evaluating the probabilities within a normal distribution.