Step 2: Confusion Matrix

TP, TN, FP, FN — the four outcomes of every prediction

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

The Four Outcomes

Every prediction for a binary classifier falls into one of four cells:

	Predicted: Benign	Predicted: Attack
Actual: Benign	TN — correct pass	FP — false alarm
Actual: Attack	FN — missed threat	TP — caught it

Outcome	Security cost
TP — True Positive	Low — this is what we want
TN — True Negative	Low — no action needed
FP — False Positive	Medium — analyst time wasted
FN — False Negative	High — system compromised

In security, FN cost >> FP cost. A missed attack is almost always more damaging than a false alarm.

Computing the Confusion Matrix

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

cm = confusion_matrix(y_test, y_pred)
print(cm)
# [[TN, FP],
#  [FN, TP]]

# Visualise as a heatmap
disp = ConfusionMatrixDisplay(cm, display_labels=['Benign', 'Attack'])
disp.plot(cmap='Blues')
plt.title('Confusion Matrix')
plt.show()

Deriving All Metrics from the Matrix

TN, FP, FN, TP = cm.ravel()

accuracy  = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP)    # of those flagged, how many real?
recall    = TP / (TP + FN)    # of all attacks, how many caught?
f1        = 2 * precision * recall / (precision + recall)

print(f"Accuracy:  {accuracy:.3f}")
print(f"Precision: {precision:.3f}")
print(f"Recall:    {recall:.3f}")
print(f"F1:        {f1:.3f}")

Every evaluation metric is just a different combination of these four numbers.

Think Deeper

Try this:

Your IDS produced: TP=45, FP=300, FN=5, TN=9650. Calculate precision and recall. Is this a good system?

Precision = 45/(45+300) = 13%. Recall = 45/(45+5) = 90%. It catches 90% of attacks but only 1 in 8 alerts is real. Whether this is 'good' depends on your SOC's capacity. 300 false alarms/day might be acceptable if they're auto-triaged; unacceptable if humans must investigate each one.

Cybersecurity tie-in: The confusion matrix is the universal language of security ML evaluation. When comparing two IDS vendors, don't compare accuracy — compare confusion matrices at the same threshold and ask which one produces fewer FN (missed attacks) at an acceptable FP rate.

← Previous ← → to navigate Next →