Quiz — Model Evaluation

1 of 5

Your malware detector reports 99.95% accuracy on a feed where 0.05% of events are malicious. What can you conclude?

When the positive class is 0.05%, accuracy is dominated by the majority class. You must check recall — what fraction of actual malware was flagged? Without that number, the accuracy figure is meaningless.

2 of 5

Your IDS produces: TP=45, FP=300, FN=5, TN=9650. What is the precision?

Precision = TP / (TP + FP) = 45 / (45 + 300) = 13%. Of every 8 alerts, only 1 is real. The recall is 90% (it catches most attacks), but the precision is poor — analysts will burn most of their time on false alarms.

3 of 5

Two models for the same job: Model A has precision=0.95, recall=0.60. Model B has precision=0.70, recall=0.95. Which one would you deploy for scanning incoming email for phishing?

For email scanning, the cost of a false negative (missed phishing → credential theft → lateral movement) is much higher than the cost of a false positive (a legit email goes to quarantine for review). Pick the model that maximises recall.

4 of 5

Model X has AUC = 0.92. Model Y has AUC = 0.95. Should you always pick Y?

AUC summarises performance across every threshold. But in production you operate at one threshold — usually in the low-FPR region. Model X might dominate Y in that specific region. Always plot the ROC curve, not just the single AUC number.

5 of 5

Your SOC team can handle 50 alerts/day. At threshold 0.3 the model produces 200 alerts; at 0.6 it produces 30 but misses 60% of attacks. What's the right move?

Threshold tuning is an operational decision, not just a math problem. 0.6 hits the capacity number but loses 60% of attacks — unacceptable. Either find a balanced threshold or invest in auto-triage so analysts only see the alerts that matter most.

End-of-lesson Quiz

Quiz complete