Hyperparameter

TL;DR

Parameters set before training that control how a learning algorithm behaves and are not learned from the data.
They strongly affect model performance, accuracy, and generalizability.
Common selection methods include trial and error, grid search, and Bayesian optimization using a validation set.

Definition

Hyperparameters are a type of parameter in machine learning algorithms that cannot be directly learned from the data. They are set prior to training the model and are used to control the behavior of the learning algorithm.

Explanation

Hyperparameters influence how a model learns and generalizes. They can determine convergence speed, the model’s tendency to overfit or underfit, and the final performance on unseen data. When choosing hyperparameters, practitioners should consider the data characteristics (for example, class imbalance) and the model’s objective (for example, maximizing recall versus precision). Selection is typically performed using a validation set and can be done by trial and error, by enumerating a grid of combinations (grid search), or by using automated methods such as Bayesian optimization, which explores the hyperparameter space probabilistically.

Examples

Learning rate (neural network)

The learning rate determines the step size at which the model updates its weights during training. A higher learning rate can lead to faster convergence but also has the potential to overshoot the optimal solution. A lower learning rate may lead to slower convergence but can help the model avoid getting stuck in a local minimum.

Regularization term (linear regression)

Regularization is a method used to prevent overfitting, which occurs when a model fits the training data too closely but fails to generalize to new data. The regularization term controls the strength of the regularization, with a higher value indicating stronger regularization. Stronger regularization can help avoid overfitting by penalizing large weights but can also lead to underfitting if it is too strong.

Use cases

Model selection and tuning to improve accuracy and generalizability.
Adjusting hyperparameters to account for data characteristics such as class imbalance.
Choosing hyperparameters to prioritize particular performance metrics (for example, maximizing recall versus maximizing precision).
Selecting hyperparameters via methods described in the source: trial and error, grid search, or Bayesian optimization using a validation set.

Notes or pitfalls

A learning rate that is too high can overshoot the optimal solution; a learning rate that is too low can cause very slow convergence or getting stuck in local minima.
Excessive regularization can cause underfitting; insufficient regularization can lead to overfitting.
Hyperparameters must be set before training and are not learned directly from the data.

Learning rate
Regularization
Overfitting
Underfitting
Validation set
Grid search
Bayesian optimization