Cost Function
- Quantifies the prediction error of a learning algorithm averaged over the dataset.
- Different cost functions suit different tasks (e.g., regression vs classification).
- The learning algorithm seeks parameters that minimize the cost, typically via gradient descent.
Definition
Section titled “Definition”A cost function is a measure of how well a learning algorithm is doing in terms of being able to predict the correct output values for a given input. It measures the accuracy of the algorithm in making predictions. The goal of any learning algorithm is to find the set of parameters that minimize the cost function.
Explanation
Section titled “Explanation”- Many different cost functions exist and are chosen according to the specific problem.
- In practice the cost function is typically computed as the average of the errors for each example in the dataset, because the objective is to minimize overall error, not only the error for a single example.
- Minimizing the cost function is usually performed with an optimization algorithm such as gradient descent, which iteratively updates parameters to reduce the cost.
Examples
Section titled “Examples”Mean squared error (MSE)
Section titled “Mean squared error (MSE)”This cost function is commonly used in regression problems, where the goal is to predict a continuous value. MSE is calculated as the average of the squared differences between the predicted values and the true values. Mathematically, it is represented as:
where yi is the true value and y^i is the predicted value for the ith example, and n is the total number of examples.
Cross-entropy
Section titled “Cross-entropy”This cost function is commonly used in classification problems, where the goal is to predict a class label. Cross-entropy is calculated as the average of the negative log likelihood of the predicted class labels. Mathematically, it is represented as:
where yi is the true class label and y^i is the predicted class label for the ith example, and n is the total number of examples.
Use cases
Section titled “Use cases”- Mean squared error: regression problems (predicting continuous values).
- Cross-entropy: classification problems (predicting class labels or class probabilities).
Notes or pitfalls
Section titled “Notes or pitfalls”- MSE penalizes large errors more than small errors, so optimization will prioritize reducing large deviations.
- Cross-entropy measures the distance between true class probabilities and predicted class probabilities, encouraging predicted probabilities to match true probabilities.
- Cost functions are typically computed as averages over the dataset to reflect overall performance.
- Gradient descent is a common method used to find parameter values that minimize the cost function.
Related terms
Section titled “Related terms”- Mean squared error (MSE)
- Cross-entropy
- Learning algorithm
- Gradient descent
- Optimization algorithm