Additive Outlier
- A single extreme value added to a dataset can distort its distribution and bias summary statistics and statistical tests.
- Additive outliers can come from genuinely extreme observations or from measurement errors.
- Detect with visual tools (box plots, scatter plots) or z-scores; address by removal or sensitivity analysis when appropriate.
Definition
Section titled “Definition”An additive outlier is a type of outlier that is caused by the addition of an extreme value to a dataset. This outlier type can occur in numerical, categorical, and time series data and can substantially affect data analysis.
Explanation
Section titled “Explanation”An additive outlier is created when an extreme value is present in a dataset. Such a value can change the apparent shape of the distribution (for example, introducing skew or non-normality) and can alter summary statistics like the mean and standard deviation. These changes can in turn affect the results of statistical tests that assume particular distributional properties (for example, t-tests and ANOVA). Additive outliers may result from genuine rare events or from errors in measurement. Common methods to identify them include box plots, scatter plots, and calculating z-scores. After identification, options include removing the outlier(s) or conducting a sensitivity analysis by repeating analyses with and without the outlier(s) to assess impact.
Examples
Section titled “Examples”Temperature example
Section titled “Temperature example”A dataset of daily temperatures for a city over a year shows an average temperature of 70 degrees Fahrenheit with a standard deviation of 5 degrees. One day records a temperature of 100 degrees, which is significantly higher than the other data points and is an additive outlier.
Weight (measurement error) example
Section titled “Weight (measurement error) example”A dataset of weights for a group shows an average weight of 150 pounds with a standard deviation of 15 pounds. One person has a recorded weight of 300 pounds, which is significantly higher than the rest and may be the result of a measurement error; this point would be considered an additive outlier.
Notes or pitfalls
Section titled “Notes or pitfalls”- Additive outliers can make distributions appear skewed or non-normal, potentially violating assumptions of statistical tests.
- They can substantially affect calculated means and standard deviations.
- Removing outliers is not always appropriate because they may contain valid information; consider sensitivity analysis to evaluate their impact.
Related terms
Section titled “Related terms”- Box plot
- Scatter plot
- Z-score
- t-test
- ANOVA
- Sensitivity analysis