Normality

TL;DR

Describes how closely data follow a symmetric, bell-shaped (normal) distribution rather than any other shape.
Many statistical procedures assume normality (notably the t-test and ANOVA), so checking it matters for valid inference.
Not all real data are normal (for example, income can be right-skewed); use appropriate methods when normality does not hold.

Definition

Normality is a statistical concept that refers to the degree to which a set of data conforms to a normal distribution, which is a symmetrical, bell-shaped curve. A normal distribution is characterized by the fact that the majority of the data points are concentrated around the mean, with fewer and fewer data points as you move further away from the mean in either direction.

Explanation

Normality is important because many statistical tests and procedures assume the data are normally distributed; verifying this assumption helps ensure valid results. Normality also serves as a benchmark for comparing a dataset’s distribution to other distributions, aiding in understanding the data’s characteristics and likely behavior. Common ways to assess normality include visualizing histograms and comparing the observed shape to the symmetric, bell-shaped curve of a normal distribution.

Examples

Example 1: Heights of adult men

Imagine that you are studying the heights of adult men in a certain population. You collect data from a random sample of 1000 men and plot the results on a histogram. The resulting distribution looks like a bell-shaped curve, with the majority of the heights concentrated around the mean and fewer and fewer heights as you move further away from the mean in either direction. This distribution would be considered normal because it conforms to the symmetrical, bell-shaped curve characteristic of a normal distribution.

Example 2: Test scores

Imagine that you are a teacher and you give a test to your students. You collect the scores and plot them on a histogram. The resulting distribution looks like a bell-shaped curve, with the majority of the scores concentrated around the mean and fewer and fewer scores as you move further away from the mean in either direction. This distribution would be considered normal because it conforms to the symmetrical, bell-shaped curve characteristic of a normal distribution.

Use cases

Assumption-checking for statistical tests such as the t-test and ANOVA.
Serving as a benchmark when comparing the shape of a dataset’s distribution to other distributions.

Notes or pitfalls

Not all data conform to a normal distribution. For example, income distributions can be skewed to the right (with a long tail on the high end), which is not normal.
When data are not normal, choose statistical tests and procedures appropriate for non-normal data.

Normal distribution
t-test
ANOVA