Step 2: How Guardrails Work

Inbound + outbound scanning, detection methods

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
💡 ReflectThink deeper

Two Checkpoints

AI Guardrails operates as an inline scanning layer with two checkpoints:

CheckpointWhat It ScansWhat It Catches
Inbound (user → LLM)User's prompt before it reaches the LLMPrompt injection, jailbreak attempts, prohibited topics
Outbound (LLM → user)LLM's response before the user sees itHallucinated content, data leakage, toxic output

Detection Methods

MethodHow It WorksLatency
Pattern matchingKnown attack signatures and templates1–5 ms
NLP classificationML models trained on attack datasets5–20 ms
Semantic analysisEmbedding-based comparison to known attacks10–30 ms
Contextual analysisFull conversation context, catches multi-turn attacks15–40 ms

Total end-to-end: ~20–50 ms — invisible compared to LLM generation time (500ms–3s).

Connection to What You Know

  • Stage 1 (Classification) — attack detection is a classification problem: safe, injection, jailbreak, toxic
  • Stage 4 (Embeddings) — semantic analysis uses the same embedding + cosine similarity approach from RAG
  • Stage 1 (Evaluation) — the same precision/recall tradeoff applies: too aggressive = blocks legitimate prompts, too permissive = misses attacks
Loading...
Loading...

Think Deeper

Your guardrails add 30ms latency. A customer says that's too slow. How do you respond?

30ms is invisible to users — LLM generation itself takes 500ms–3s. The guardrails latency is less than 5% of total response time. Compare: a WAF adds 1–5ms but can't detect prompt injection at all. The tradeoff is 30ms of latency for catching attacks that no other security layer can.
Key insight: Guardrails use the same ML techniques you've learned throughout this program. Classification, embeddings, semantic similarity — you can explain how they work, not just that they work. That technical depth is your competitive advantage.

Loading...