Backpropagation

TL;DR

Trains a neural network by computing the output error and propagating that error backward to compute gradients for each weight and bias.
Uses those gradients with gradient descent (and a learning rate) to update parameters.
Repeats forward and backward passes until the network error is reduced to a satisfactory level.

Definition

Backpropagation (backprop) is a supervised learning method used in artificial neural networks that employs a gradient descent optimization technique to minimize the error between the network’s predicted output and the actual output by updating the network’s weights and biases.

Explanation

Backpropagation operates in repeated cycles composed of these steps:

Forward propagation: Input values pass through the network layer by layer. At each neuron, inputs are multiplied by weights and added to biases to produce that neuron’s output.
Error calculation: The network output is compared to the actual target value using a cost (loss) function, for example mean squared error.
Backward propagation: The error is propagated back through the network. This involves computing the gradient of the error with respect to each weight and bias.
Weight and bias update: Gradients are used to update weights and biases via gradient descent. Parameters are adjusted in the opposite direction of the gradient, scaled by the learning rate.
Repeat: The cycle of forward propagation, error calculation, backward propagation, and parameter update is repeated over multiple iterations until the error is reduced to a satisfactory level.

Examples

Simple feedforward network example (from source)

Network architecture:

Input layer: 3 neurons
Hidden layer: 2 neurons
Output layer: 1 neuron

Inputs and target:

Input values: [1, 2, 3]
Actual output (target): 4

Initial weights and biases:

Input layer to hidden layer:

Weight 1: 0.5
Weight 2: 0.1
Bias: 0.2

Hidden layer to output layer:

Weight 1: 0.3
Bias: 0.1

Forward propagation:

Input layer to hidden layer:
- Neuron 1: (1 * 0.5) + (2 * 0.1) + 0.2 = 1.3
- Neuron 2: (1 * 0.5) + (2 * 0.2) = 1.3
Hidden layer to output layer:
- Neuron 1: (1.3 * 0.3) + 0.1 = 0.9

The output of the network is 0.9, which differs from the actual output value of 4.

Error calculation: The mean squared error is given as:

\text{Error} = \frac{1}{2} (\text{Actual output} - \text{Predicted output})^2

Using the example values:

Error = 1/2 * (4 - 0.9)^2 = 6.31

Backward propagation (gradients computed as shown in source):

For the output layer to hidden layer weights and bias:
- Weight 1: (1/2 * (4 - 0.9)^2) * (-1) * (1.3) = -2.87
- Bias: (1/2 * (4 - 0.9)^2) * (-1) * (1) = -2.5
For the hidden layer to input layer weights and bias:
- Weight 1: (1/2 * (4 - 0.9)^2) * (-1) * (0.3) * (1) = -0.87
- Weight 2: (1/2 * (4 - 0.9)^2) * (-1) * (0.3) * (2) = -1.75
- Bias: (1/2 * (4 - 0.9)^2) * (-1) * (0.3) * (1) = -0.87

Weight and bias update (learning rate = 0.1):

Input layer to hidden layer:
- Weight 1: 0.5 - 0.1 * (-0.87) = 0.57
- Weight 2: 0.1 - 0.1 * (-1.75) = 0.275
- Bias: 0.2 - 0.1 * (-0.87) = 0.27
Hidden layer to output layer:
- Weight 1: 0.3 - 0.1 * (-2.87) = 0.387
- Bias: 0.1 - 0.1 * (-2.5) = 0.35

Repeat:

The forward/backward/update cycle is repeated across multiple iterations until the error is minimized to a satisfactory level. Over successive iterations, weights and biases adjust and the network’s predictions become more accurate.

Gradient descent
Supervised learning
Feedforward neural network
Weights and biases
Mean squared error