Skip to content

Hat Matrix

  • A matrix that projects observed dependent-variable values to fitted values in a regression model.
  • Used to compute leverage and influence of individual observations.
  • Helps identify influential observations and assess whether removing an observation would substantially change model coefficients.

The Hat Matrix (also called the Leverage Matrix or Influence Matrix) is defined for a regression model by the matrix

H=X(XX)1XH = X(X'X)^{-1}X'

where X is the design matrix and X’ is its transpose. The matrix X’X is the Gram Matrix, whose inverse appears in the Hat Matrix formula.

  • The design matrix X contains the values of the independent variables for each observation.
  • Applying H to the vector of observed dependent-variable values produces the fitted values; this mapping is the reason it is called the “Hat” Matrix (resembling the hat (^) notation that denotes fitted values).
  • Diagonal elements of H are used to quantify leverage, which measures how much an observation deviates from the average of all observations.
  • Influence combines leverage with the size of an observation’s residual to indicate how much the regression model would change if that observation were removed.

Suppose we have a dataset with 100 observations and fit a simple linear regression model. The Hat Matrix can be used to identify which observations have the greatest influence on the model. Observations with high leverage and high influence are likely to be influential and should be examined carefully to ensure they are not outliers or otherwise problematic.

Assessing the stability of the regression model

Section titled “Assessing the stability of the regression model”

The Hat Matrix can be used to assess model stability. If an observation has high leverage and high influence, removing it from the dataset could significantly change the model coefficients, indicating the model may not be stable or robust to changes in the dataset.

  • Identifying influential observations in regression analyses.
  • Assessing whether removing specific observations would substantially change model coefficients (model stability checks).
  • Observations with high leverage and high influence warrant careful examination for outlier status or data issues.
  • A model whose coefficients change substantially when such observations are removed may lack robustness and might require re-evaluation.
  • Leverage Matrix
  • Influence Matrix
  • Gram Matrix