Gradient Descent
- Iterative method that adjusts parameters to reduce a cost function using partial derivatives (gradients).
- Starts from an initial guess and moves opposite the gradients by a small step called the learning rate until a minimum is reached.
- Commonly used in machine learning to find optimal model weights (for example, in linear regression).
Definition
Section titled “Definition”Gradient descent is an optimization algorithm used to find the values of parameters that minimize a given cost function. It is commonly used in machine learning to find the optimal weights for a model.
Explanation
Section titled “Explanation”Begin with an initial guess for the parameters. Compute the partial derivatives of the cost function with respect to each parameter; these derivatives indicate the directions in which the cost increases most. Move the parameters in the opposite direction of those partial derivatives by a small step (the learning rate) to reduce the cost. Repeat this process iteratively until a minimum value of the cost function is reached.
The method is illustrated with a simple two-variable example: a cost function depending on x and y (for instance, the sum of the squares of the differences between predicted and actual values). The goal is to find the values of x and y that minimize that cost function by repeatedly computing partial derivatives and updating x and y in the opposite direction of the derivatives.
Examples
Section titled “Examples”Two-variable example
Section titled “Two-variable example”A cost function depending on two variables, x and y, can be the sum of the squares of the differences between predicted values and actual values. The goal is to find x and y that minimize this cost function by computing partial derivatives with respect to x and y and moving opposite those derivatives using a learning rate.
Linear regression example
Section titled “Linear regression example”In linear regression, fit a line to data by minimizing the sum of the squares of differences between predicted values and actual values. The cost function can be written as:
Here, w is the vector of weights to find. Use gradient descent by starting with an initial guess for w, computing partial derivatives of J with respect to each weight, and updating w in the opposite direction of those partial derivatives by the learning rate until a minimum is reached.
Use cases
Section titled “Use cases”- Finding optimal weights for machine learning models.
- Applying to linear regression to minimize the sum of squared errors.
Notes or pitfalls
Section titled “Notes or pitfalls”- Requires an initial guess for the parameters.
- Updates use a learning rate, a small step size that determines how far parameters move opposite the gradient.
- The update-and-evaluate process is repeated until the cost function reaches a minimum value.
Related terms
Section titled “Related terms”- Cost function
- Partial derivatives
- Learning rate
- Linear regression
- Weights (model parameters)