End-of-lesson Quiz

5 questions · k-Means Clustering

0/5 answered
1 of 5
Your team has 2 million network connection logs but zero labelled attacks. Can you still build a detection system?
This is exactly what unsupervised anomaly detection is for. K-Means learns the shape of normal traffic from the data alone — new connections far from every centroid get flagged. You never needed labels. This is how baseline behavioural detection works in production SOCs.
2 of 5
What is the fundamental task of K-Means clustering?
K-Means iteratively (1) assigns each point to its nearest centroid, then (2) recomputes each centroid as the mean of its assigned points. The result: K groups of similar samples. No labels needed.
3 of 5
The elbow method suggests K=3 but the silhouette score peaks at K=4. Which should you trust more?
Silhouette score measures how well-separated and cohesive each cluster is — it's a much more direct quality measure than inertia. The elbow plot always slopes downward as K grows, so the 'elbow' is sometimes ambiguous. Trust silhouette when they disagree.
4 of 5
How does an anomaly score work in K-Means-based detection?
Normal traffic clusters tightly around centroids. An anomalous connection lands far from every centroid — that distance is its anomaly score. Set a percentile threshold (e.g. 95th percentile of distances) and alert on anything above it.
5 of 5
You set your anomaly threshold at the 95th percentile and a week later 20% of connections are being flagged. What likely happened?
Your K-Means baseline was trained on last month's normal. Networks evolve — new services, new patterns, new applications. This is concept drift. The fix: retrain the baseline on a recent window of data so the centroids reflect current normal.

Loading...