Regularization

TL;DR

Adds a penalty to the objective function to reduce overfitting and improve generalization.
L1 (Lasso) penalizes the absolute value of weights and can produce sparse models (some weights become zero).
L2 (Ridge) penalizes the square of the weights and shrinks weights toward zero without setting them to zero.

Definition

Regularization is a technique used in machine learning to prevent overfitting, which occurs when a model fits too closely to the training data and performs poorly on unseen data. The two main types are L1 and L2 regularization.

Explanation

L1 regularization (Lasso) adds a penalty term to the objective function proportional to the absolute value of the weights multiplied by a hyperparameter. That hyperparameter controls the strength of the penalty and must be set by the user. L1 regularization tends to produce sparse models by setting some weights to zero, which can aid feature selection.
L2 regularization (Ridge) adds a penalty term to the objective function proportional to the square of the weights multiplied by a hyperparameter. Like L1, the hyperparameter controls the penalty strength. L2 regularization shrinks weights toward zero but typically does not set them exactly to zero.
Both methods reduce model complexity via the added penalty term, making models more robust to unseen data, at the cost of potentially reducing the model’s ability to capture important relationships.

Examples

Example 1 — Linear regression for house pricing (L1)

Consider a linear regression model used to predict the price of a house based on its size and number of bedrooms. Without regularization, the model might fit the training data very well but fail to generalize. L1 regularization can prevent overfitting by adding a penalty term to the objective function proportional to the absolute value of the weights. This can result in some weights being set to zero, reducing model complexity and improving robustness to unseen data.

Example 2 — Neural network image classifier (L2)

Consider a neural network used to classify images of dogs and cats. Without regularization, the model might fit the training data very well but fail to generalize. L2 regularization can prevent overfitting by adding a penalty term to the objective function proportional to the square of the weights. This shrinks weights toward zero without setting them to zero, reducing complexity and improving robustness to unseen data.

Notes or pitfalls

Regularization improves generalization but can also reduce the model’s ability to capture important relationships in the data.
The penalty strength is controlled by a hyperparameter that must be chosen by the user.

L1 regularization
L2 regularization
Lasso
Ridge
Overfitting
Objective function
Hyperparameter
Feature selection