Quiz — HuggingFace Pre-trained Models

1 of 5

What's the killer feature of zero-shot classification for security teams?

Zero-shot classification uses a pre-trained NLI model that already understands semantic similarity. You give it text and a list of candidate labels (e.g. 'phishing', 'C2 traffic', 'normal'), and it picks the best fit. No training data needed — perfect for new threat categories where you have zero labels.

2 of 5

What does a sentence embedding capture about a sentence?

A sentence embedding turns 'User logged in from VPN' and 'Employee accessed system remotely via VPN' into similar vectors despite using different words. This is the foundation of semantic search — matching by meaning rather than keywords.

3 of 5

Why does semantic search beat traditional keyword search for a SOC knowledge base?

Keyword search misses paraphrases. Semantic search converts both query and documents to vectors, then matches by meaning — a SOC analyst typing 'lateral movement detection' will find chunks about pass-the-hash, RDP brute-force, and SMB enumeration even if none use the words 'lateral movement'.

4 of 5

An attacker who knows your zero-shot labels could use that knowledge to evade detection. How?

Label design controls model output. If your candidate labels are public, an attacker can phrase malicious activity in language that your NLI classifier scores as 'normal'. Defense: keep label sets internal, add ensemble checks, and combine with other detection layers.

5 of 5

Cosine similarity between 'User logged in from VPN' and 'Employee accessed system remotely via VPN' is around 0.85. What does this number mean?

Cosine similarity measures the angle between two vectors. 1.0 means they point the same way (same meaning), 0 means perpendicular (unrelated), -1 means opposite. ~0.85 between two paraphrases is exactly what you want from a good embedding model.

End-of-lesson Quiz

Quiz complete