Hat Matrix

TL;DR

A matrix that projects observed dependent-variable values to fitted values in a regression model.
Used to compute leverage and influence of individual observations.
Helps identify influential observations and assess whether removing an observation would substantially change model coefficients.

Definition

The Hat Matrix (also called the Leverage Matrix or Influence Matrix) is defined for a regression model by the matrix

H = X(X'X)^{-1}X'

where X is the design matrix and X’ is its transpose. The matrix X’X is the Gram Matrix, whose inverse appears in the Hat Matrix formula.

Explanation

The design matrix X contains the values of the independent variables for each observation.
Applying H to the vector of observed dependent-variable values produces the fitted values; this mapping is the reason it is called the “Hat” Matrix (resembling the hat (^) notation that denotes fitted values).
Diagonal elements of H are used to quantify leverage, which measures how much an observation deviates from the average of all observations.
Influence combines leverage with the size of an observation’s residual to indicate how much the regression model would change if that observation were removed.

Examples

Identifying influential observations

Suppose we have a dataset with 100 observations and fit a simple linear regression model. The Hat Matrix can be used to identify which observations have the greatest influence on the model. Observations with high leverage and high influence are likely to be influential and should be examined carefully to ensure they are not outliers or otherwise problematic.

Assessing the stability of the regression model

The Hat Matrix can be used to assess model stability. If an observation has high leverage and high influence, removing it from the dataset could significantly change the model coefficients, indicating the model may not be stable or robust to changes in the dataset.

Use cases

Identifying influential observations in regression analyses.
Assessing whether removing specific observations would substantially change model coefficients (model stability checks).

Notes or pitfalls

Observations with high leverage and high influence warrant careful examination for outlier status or data issues.
A model whose coefficients change substantially when such observations are removed may lack robustness and might require re-evaluation.

Leverage Matrix
Influence Matrix
Gram Matrix