Step 5: Threshold Tuning

Choose the right operating point for your SOC

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

The Operational Decision

Once you've selected a model, you still have one more decision: what threshold to deploy at. This is not a technical question — it's a business question:

"What is the cost of a missed attack vs the cost of a false alarm in your environment?"

Two Scenarios

	Scenario A: "Catch All Attacks"	Scenario B: "Trusted Alerts Only"
Goal	Miss as few attacks as possible	When we alert, we want to be right
Accept	Higher false alarm rate	Some attacks may slip through
Threshold	Low (0.2–0.3)	High (0.6–0.8)
Key metric	Recall on attack class	Precision on attack class
Use case	Email security, malware sandbox	Auto-blocking firewall rules

Building a Threshold Summary Table

from sklearn.metrics import precision_score, recall_score, f1_score

probs = model.predict_proba(X_test_scaled)[:, 1]

print(f"{'Threshold':>10} {'Precision':>10} {'Recall':>10} {'F1':>10} {'Flagged':>10}")
print("-" * 55)

for t in [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]:
    y_pred_t = (probs >= t).astype(int)
    p = precision_score(y_test, y_pred_t, zero_division=0)
    r = recall_score(y_test, y_pred_t, zero_division=0)
    f = f1_score(y_test, y_pred_t, zero_division=0)
    n = y_pred_t.sum()
    print(f"{t:>10.1f} {p:>10.3f} {r:>10.3f} {f:>10.3f} {n:>10}")

This table is what you present to stakeholders. They decide the acceptable tradeoff — you provide the data.

Communicating to Stakeholders

Avoid jargon. Instead of "precision = 0.85 at threshold 0.3", say:

"At this setting, we catch 95% of attacks but 1 in 6 alerts is a false alarm"
"If we tighten the filter, false alarms drop to 1 in 20 but we miss 30% of attacks"
"Your team handles 50 alerts/day — at threshold X, we generate 45"

The threshold is a dial, not a fixed value. Different deployments may use different thresholds.

Think Deeper

Try this:

Your SOC team says they can handle 50 alerts per day. Your model produces 200 at threshold 0.3 and 30 at threshold 0.6. What threshold do you recommend?

Threshold 0.6 keeps alerts under capacity (30/day), but check how many real attacks it misses. If recall drops from 95% to 40%, you're missing 60% of threats. The answer might be: deploy at 0.45 (find the threshold that produces ~50 alerts), or add auto-triage to handle volume at 0.3. The threshold decision is operational, not just mathematical.

Cybersecurity tie-in: This is the most operationally important step in the entire module. A model is only as useful as its deployment threshold. The right threshold depends on your SOC's size, alert triage automation, and the consequences of both missed threats and false alarms. Models don't make decisions — operators do.

← Previous ← → to navigate Next →