Understand the 2x2 confusion matrix, what TP, FP, FN and TN mean, and why it matters for classification evaluation.
The four parts of the matrix
For binary classification, a confusion matrix is a 2x2 table. It compares actual values with predicted values and splits results into true positives, false positives, false negatives and true negatives.
These four counts are the building blocks for metrics such as precision, recall, specificity, accuracy and F1 score.
- True Positive: the model predicts positive and the result is actually positive.
- False Positive: the model predicts positive, but the result is actually negative.
- False Negative: the model predicts negative, but the result is actually positive.
- True Negative: the model predicts negative and the result is actually negative.
Why a single accuracy score is not enough
Accuracy can look good even when a model fails on the cases you care about most. For example, in an imbalanced medical dataset, always predicting the majority class could produce a high accuracy while missing rare positive cases.
The confusion matrix helps you see whether your errors are concentrated in one direction. That is often far more useful than looking at a single summary number.
Advertisement
AdSense slot placeholder
How the matrix helps different use cases
Different applications care about different mistakes. Spam detection may tolerate some false positives, while fraud detection may care deeply about false negatives. The confusion matrix makes that trade-off visible.
Because it is easy to read, it is also useful in reporting. Non-technical stakeholders can understand the table faster than they can interpret abstract metrics.
- Use it to compare models, not just to evaluate one model once.
- Use it to explain threshold changes in binary classification.
- Use it to identify whether recall or precision needs more attention.
From matrix counts to useful metrics
Once you have TP, FP, FN and TN, you can calculate the rest of the standard classification metrics. Precision focuses on prediction quality among positive predictions. Recall focuses on how many actual positives were found.
F1 score balances precision and recall, while specificity measures how well the model identifies negatives. This is why a confusion matrix is often the starting point for model evaluation.