Skip to content

Additive Outlier

  • A single extreme value added to a dataset can distort its distribution and bias summary statistics and statistical tests.
  • Additive outliers can come from genuinely extreme observations or from measurement errors.
  • Detect with visual tools (box plots, scatter plots) or z-scores; address by removal or sensitivity analysis when appropriate.

An additive outlier is a type of outlier that is caused by the addition of an extreme value to a dataset. This outlier type can occur in numerical, categorical, and time series data and can substantially affect data analysis.

An additive outlier is created when an extreme value is present in a dataset. Such a value can change the apparent shape of the distribution (for example, introducing skew or non-normality) and can alter summary statistics like the mean and standard deviation. These changes can in turn affect the results of statistical tests that assume particular distributional properties (for example, t-tests and ANOVA). Additive outliers may result from genuine rare events or from errors in measurement. Common methods to identify them include box plots, scatter plots, and calculating z-scores. After identification, options include removing the outlier(s) or conducting a sensitivity analysis by repeating analyses with and without the outlier(s) to assess impact.

A dataset of daily temperatures for a city over a year shows an average temperature of 70 degrees Fahrenheit with a standard deviation of 5 degrees. One day records a temperature of 100 degrees, which is significantly higher than the other data points and is an additive outlier.

A dataset of weights for a group shows an average weight of 150 pounds with a standard deviation of 15 pounds. One person has a recorded weight of 300 pounds, which is significantly higher than the rest and may be the result of a measurement error; this point would be considered an additive outlier.

  • Additive outliers can make distributions appear skewed or non-normal, potentially violating assumptions of statistical tests.
  • They can substantially affect calculated means and standard deviations.
  • Removing outliers is not always appropriate because they may contain valid information; consider sensitivity analysis to evaluate their impact.
  • Box plot
  • Scatter plot
  • Z-score
  • t-test
  • ANOVA
  • Sensitivity analysis