Step 4: Threshold Tuning

predict_proba() and the precision-recall tradeoff

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

predict_proba() vs predict()

model.predict(X) returns hard labels (0 or 1) using the default 0.5 threshold.

model.predict_proba(X) returns probabilities — you can then apply any threshold yourself:

probs = model.predict_proba(X_test_scaled)[:, 1]   # P(phishing)

# Custom threshold
threshold = 0.3
y_pred_custom = (probs >= threshold).astype(int)

print(f"Default (0.5): {(probs >= 0.5).sum()} flagged")
print(f"Lower   (0.3): {(probs >= 0.3).sum()} flagged")

The Precision-Recall Tradeoff

	Lower threshold (e.g. 0.3)	Higher threshold (e.g. 0.7)
Recall	HIGH — catch almost all phishing	LOW — miss some threats
Precision	LOW — many false alarms	HIGH — few false alarms
Use when	Missing a threat is very costly	False alarms are very costly

This is a fundamental tradeoff — you cannot have both perfect precision and perfect recall unless your model is perfect.

Choosing the Right Threshold

The optimal threshold depends on your operational priorities:

Scenario	Priority	Threshold
Email phishing gateway	Don't miss threats (high recall)	Low (0.2–0.4)
Auto-blocking firewall rule	Minimal false positives (high precision)	High (0.7–0.9)
Balanced triage queue	Equal weight to both	F1-optimal (~0.5)

from sklearn.metrics import precision_recall_curve

# precision_recall_curve sweeps every threshold from 0 to 1 and reports
# the precision and recall you'd get at each one — perfect for finding
# the trade-off point that matches your operational priorities.
precisions, recalls, thresholds = precision_recall_curve(y_test, probs)

# Find threshold that maximises F1
f1_scores = 2 * (precisions * recalls) / (precisions + recalls + 1e-8)
best_idx = f1_scores.argmax()
print(f"Best threshold: {thresholds[best_idx]:.2f}")
print(f"Precision: {precisions[best_idx]:.3f}")
print(f"Recall:    {recalls[best_idx]:.3f}")

Think Deeper

Try this:

An email security gateway processes 1 million URLs per day. At threshold 0.3, precision is 85%. How many false alarms per day if 1% of URLs are phishing?

10,000 phishing URLs. At 85% precision, for every 100 flagged URLs, 15 are false alarms. If recall is ~95%, we flag ~9,500 true positives. Total flagged ≈ 9,500 / 0.85 ≈ 11,176. False alarms ≈ 11,176 - 9,500 = ~1,676 per day. That's ~70 per hour — manageable if automated triage handles most of them.

Cybersecurity tie-in: Threshold tuning is the difference between a useful security tool and a noisy one. A phishing detector with 99% recall but 50% precision generates twice as many false alarms as real detections — analysts stop trusting it. The right threshold depends on your SOC's capacity, not just the model's metrics.

← Previous ← → to navigate Next →