Quiz — k-Means Clustering

1 of 5

Your team has 2 million network connection logs but zero labelled attacks. Can you still build a detection system?

This is exactly what unsupervised anomaly detection is for. K-Means learns the shape of normal traffic from the data alone — new connections far from every centroid get flagged. You never needed labels. This is how baseline behavioural detection works in production SOCs.

2 of 5

What is the fundamental task of K-Means clustering?

K-Means iteratively (1) assigns each point to its nearest centroid, then (2) recomputes each centroid as the mean of its assigned points. The result: K groups of similar samples. No labels needed.

3 of 5

The elbow method suggests K=3 but the silhouette score peaks at K=4. Which should you trust more?

Silhouette score measures how well-separated and cohesive each cluster is — it's a much more direct quality measure than inertia. The elbow plot always slopes downward as K grows, so the 'elbow' is sometimes ambiguous. Trust silhouette when they disagree.

4 of 5

How does an anomaly score work in K-Means-based detection?

Normal traffic clusters tightly around centroids. An anomalous connection lands far from every centroid — that distance is its anomaly score. Set a percentile threshold (e.g. 95th percentile of distances) and alert on anything above it.

5 of 5

You set your anomaly threshold at the 95th percentile and a week later 20% of connections are being flagged. What likely happened?

Your K-Means baseline was trained on last month's normal. Networks evolve — new services, new patterns, new applications. This is concept drift. The fix: retrain the baseline on a recent window of data so the centroids reflect current normal.

End-of-lesson Quiz

Quiz complete