Mean Squared Error Mse
- A loss metric that averages the squared differences between model predictions and true values.
- Commonly used for regression and easily optimized because it is differentiable.
- Interpretable but sensitive to outliers and produces units squared (e.g., dollars squared).
Definition
Section titled “Definition”Mean squared error (MSE) is a loss function defined as the average squared difference between the predicted values and the true values. It measures the average squared error between predictions and ground-truth values.
Explanation
Section titled “Explanation”MSE is calculated by squaring the error (difference between predicted and true value) for each observation, then averaging those squared errors across the dataset. Because the errors are squared, MSE has the same units as the squared units of the predicted and true values (for example, dollars squared for stock prices). MSE is differentiable and computationally simple, which makes it suitable for optimization algorithms such as gradient descent. However, squaring errors makes MSE sensitive to outliers and less robust when the true-value distribution is heavily skewed.
Examples
Section titled “Examples”Single prediction
Section titled “Single prediction”If the true stock price is 100 and the model predicts 105, the squared error is:
If the model predicts 90, the squared error is:
Ten-day predictions
Section titled “Ten-day predictions”For a dataset of 10 true stock prices where the model’s next-day predictions are: 105, 90, 110, 95, 100, 105, 100, 95, 105, and 110, the MSE is:
Use cases
Section titled “Use cases”- Regression problems that predict continuous values (examples given: stock price, temperature).
- Loss function for models trained with optimization methods like gradient descent.
Notes or pitfalls
Section titled “Notes or pitfalls”- Sensitive to outliers: a single large error can disproportionately increase MSE.
- Not robust to skewed data: errors on larger true values can dominate the average, potentially misrepresenting overall performance.
Related terms
Section titled “Related terms”- Loss function
- Regression
- Gradient descent