Log Loss
- A classification metric that evaluates predicted probabilities, not just hard labels.
- Penalizes confident incorrect predictions more than uncertain ones; lower values are better.
- Commonly used in machine learning competitions because it is sensitive to probabilities near 0 and 1.
Definition
Section titled “Definition”Log loss, also known as cross entropy loss, is a performance metric used in classification tasks. It measures the difference between the predicted probability and the actual outcome.
For a binary classification example where the actual outcome is “positive”, the log loss is:
For a multi-class classification task, log loss is given as:
Explanation
Section titled “Explanation”Log loss evaluates how well predicted probability distributions match the actual outcomes. A lower log loss indicates predicted probabilities are closer to the true outcome. Because the metric uses the logarithm of predicted probabilities, predictions that assign probabilities near 0 to the true class incur large penalties, while assigning high probability to the true class yields a small loss. This sensitivity to probabilities close to 0 and 1 makes log loss especially informative in settings where calibrated probability estimates matter.
Examples
Section titled “Examples”Binary classification example
Section titled “Binary classification example”If the actual outcome is “positive”, log loss is computed as:
If the predicted probability of the positive class is 0.9, the log loss is:
Multi-class classification example
Section titled “Multi-class classification example”For 3 classes (A, B, C), log loss is:
If the actual outcome is “A” and the predicted probabilities are [0.1, 0.3, 0.6], then:
Use cases
Section titled “Use cases”- Commonly used in machine learning competitions as a performance metric because it assesses the quality of probability estimates and is sensitive to extreme probabilities.
Notes or pitfalls
Section titled “Notes or pitfalls”- Log loss penalizes heavily for incorrect predictions with high confidence (probabilities close to 0 or 1).
- It rewards correct predictions given with high probability by producing low loss values.
Related terms
Section titled “Related terms”- Cross entropy loss