predict_proba() vs predict()
model.predict(X) returns hard labels (0 or 1) using the default 0.5 threshold.
model.predict_proba(X) returns probabilities — you can then apply any threshold yourself:
probs = model.predict_proba(X_test_scaled)[:, 1] # P(phishing)
# Custom threshold
threshold = 0.3
y_pred_custom = (probs >= threshold).astype(int)
print(f"Default (0.5): {(probs >= 0.5).sum()} flagged")
print(f"Lower (0.3): {(probs >= 0.3).sum()} flagged")
The Precision-Recall Tradeoff
| Lower threshold (e.g. 0.3) | Higher threshold (e.g. 0.7) | |
|---|---|---|
| Recall | HIGH — catch almost all phishing | LOW — miss some threats |
| Precision | LOW — many false alarms | HIGH — few false alarms |
| Use when | Missing a threat is very costly | False alarms are very costly |
This is a fundamental tradeoff — you cannot have both perfect precision and perfect recall unless your model is perfect.
Choosing the Right Threshold
The optimal threshold depends on your operational priorities:
| Scenario | Priority | Threshold |
|---|---|---|
| Email phishing gateway | Don't miss threats (high recall) | Low (0.2–0.4) |
| Auto-blocking firewall rule | Minimal false positives (high precision) | High (0.7–0.9) |
| Balanced triage queue | Equal weight to both | F1-optimal (~0.5) |
from sklearn.metrics import precision_recall_curve
# precision_recall_curve sweeps every threshold from 0 to 1 and reports
# the precision and recall you'd get at each one — perfect for finding
# the trade-off point that matches your operational priorities.
precisions, recalls, thresholds = precision_recall_curve(y_test, probs)
# Find threshold that maximises F1
f1_scores = 2 * (precisions * recalls) / (precisions + recalls + 1e-8)
best_idx = f1_scores.argmax()
print(f"Best threshold: {thresholds[best_idx]:.2f}")
print(f"Precision: {precisions[best_idx]:.3f}")
print(f"Recall: {recalls[best_idx]:.3f}")
Loading...
Loading...
Loading...
Think Deeper
Try this:
An email security gateway processes 1 million URLs per day. At threshold 0.3, precision is 85%. How many false alarms per day if 1% of URLs are phishing?
10,000 phishing URLs. At 85% precision, for every 100 flagged URLs, 15 are false alarms. If recall is ~95%, we flag ~9,500 true positives. Total flagged ≈ 9,500 / 0.85 ≈ 11,176. False alarms ≈ 11,176 - 9,500 = ~1,676 per day. That's ~70 per hour — manageable if automated triage handles most of them.
Cybersecurity tie-in: Threshold tuning is the difference between a useful security tool
and a noisy one. A phishing detector with 99% recall but 50% precision generates twice as many false alarms
as real detections — analysts stop trusting it. The right threshold depends on your SOC's capacity,
not just the model's metrics.