Heteroscedasticity
- Residual (error) variability changes across values of the independent variable rather than remaining constant.
- Heteroscedasticity can bias coefficient and standard error estimates, undermining model reliability.
- Detect with residuals plots or tests (Breusch-Pagan, White); address via transformations (e.g., log) or weighted least squares.
Definition
Section titled “Definition”Heteroscedasticity is a statistical term that refers to the unequal dispersion of the residuals in a regression model. It is a situation where the variability of the residuals (the errors or differences between the observed and predicted values) is not constant across all values of the independent variable.
Explanation
Section titled “Explanation”When a regression model is heteroscedastic, residuals do not have uniform spread across predicted values or values of an independent variable. In a residuals plot (residuals plotted against predicted values), a homoscedastic model will show residuals evenly dispersed around the zero line; if dispersion varies systematically, the model is heteroscedastic.
Heteroscedasticity affects the reliability and validity of regression results by producing biased estimates of coefficients and standard errors. Detecting heteroscedasticity can be done visually (residuals plot) or with statistical tests such as the Breusch-Pagan test or the White test. Those tests compare variances of residuals at different values of the independent variable and rely on the assumption that the residuals are normally distributed.
To correct heteroscedasticity, one can transform the dependent variable (for example, take the logarithm if residuals are skewed to the right) or apply weighted least squares regression, which assigns higher weights to observations with smaller variances and lower weights to observations with larger variances.
Examples
Section titled “Examples”Income and spending
Section titled “Income and spending”When analyzing the relationship between income and spending, the variability of the residuals can be higher for individuals with higher incomes compared to those with lower incomes, because higher-income individuals tend to have more discretionary income and therefore more variable spending patterns.
Education levels and job satisfaction
Section titled “Education levels and job satisfaction”When analyzing the relationship between education levels and job satisfaction, the variability of the residuals can be higher for individuals with higher levels of education compared to those with lower levels of education, because individuals with higher education are more likely to have multiple job options and therefore more variable job satisfaction.
Notes or pitfalls
Section titled “Notes or pitfalls”- Heteroscedasticity can lead to biased estimates of coefficients and standard errors, affecting conclusions drawn from the model.
- Visual inspection of residuals plots is a common initial check: evenly dispersed residuals suggest homoscedasticity; uneven dispersion suggests heteroscedasticity.
- Statistical tests for heteroscedasticity (Breusch-Pagan, White) are based on the assumption that residuals are normally distributed.
- Correction strategies described in the source include transforming the dependent variable (e.g., logarithm) and using weighted least squares regression, which weights observations by the inverse of their variances.
Related terms
Section titled “Related terms”- Residuals
- Regression model
- Homoscedastic
- Residuals plot
- Predicted values
- Breusch-Pagan test
- White test
- Dependent variable
- Weighted least squares