Skip to content

Gradient Descent

  • Iterative method that adjusts parameters to reduce a cost function using partial derivatives (gradients).
  • Starts from an initial guess and moves opposite the gradients by a small step called the learning rate until a minimum is reached.
  • Commonly used in machine learning to find optimal model weights (for example, in linear regression).

Gradient descent is an optimization algorithm used to find the values of parameters that minimize a given cost function. It is commonly used in machine learning to find the optimal weights for a model.

Begin with an initial guess for the parameters. Compute the partial derivatives of the cost function with respect to each parameter; these derivatives indicate the directions in which the cost increases most. Move the parameters in the opposite direction of those partial derivatives by a small step (the learning rate) to reduce the cost. Repeat this process iteratively until a minimum value of the cost function is reached.

The method is illustrated with a simple two-variable example: a cost function depending on x and y (for instance, the sum of the squares of the differences between predicted and actual values). The goal is to find the values of x and y that minimize that cost function by repeatedly computing partial derivatives and updating x and y in the opposite direction of the derivatives.

A cost function depending on two variables, x and y, can be the sum of the squares of the differences between predicted values and actual values. The goal is to find x and y that minimize this cost function by computing partial derivatives with respect to x and y and moving opposite those derivatives using a learning rate.

In linear regression, fit a line to data by minimizing the sum of the squares of differences between predicted values and actual values. The cost function can be written as:

J(w)=(yy^)2J(w) = \sum (y - \hat{y})^2

Here, w is the vector of weights to find. Use gradient descent by starting with an initial guess for w, computing partial derivatives of J with respect to each weight, and updating w in the opposite direction of those partial derivatives by the learning rate until a minimum is reached.

  • Finding optimal weights for machine learning models.
  • Applying to linear regression to minimize the sum of squared errors.
  • Requires an initial guess for the parameters.
  • Updates use a learning rate, a small step size that determines how far parameters move opposite the gradient.
  • The update-and-evaluate process is repeated until the cost function reaches a minimum value.
  • Cost function
  • Partial derivatives
  • Learning rate
  • Linear regression
  • Weights (model parameters)