Lab: Lakera-Demo
A web-based testing platform with 24+ documented attack vectors across 9 categories. You'll attack a real LLM through Lakera's input/output scanners and watch which payloads slip through.
Lab Guide (PDF)
The Playground guide is your map for this entire step. Keep it open in a second tab while you work through the exercises.
One-shot setup
Run these from your terminal:
git clone https://github.com/alshawwaf/Lakera-Demo.git
cd Lakera-Demo
python -m venv venv
# Windows: venv\Scripts\activate
# macOS/Linux: source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env -- add your LAKERA_API_KEY and LLM provider key
python app.py
Open http://127.0.0.1:9000 in your browser. The Playground guide PDF (above) walks you through the UI panel by panel.
Exercise 1 — Test the attack library (20 min)
Open the Playground and run at least five built-in attacks from different categories:
- Jailbreak (role-play bypass)
- Prompt injection (instruction override)
- Data extraction (system-prompt leak)
- PII extraction
- Toxic content generation
For each attack, record three things in your notes: caught? (y/n), category Lakera assigned, and confidence score. You'll need this table for Exercise 3.
Exercise 2 — Craft a novel attack (15 min)
Write your own attack prompt that is not in the built-in library. Pick at least one technique:
- Encode the payload in a non-English language
- Split the malicious instruction across multiple turns
- Embed the attack inside a "summarise this document" request (indirect injection)
- Rewrite a known jailbreak using synonyms / paraphrase
Did the guardrails catch it? Why or why not? If it slipped through, the guide PDF's Detection Methods section explains which scanner type would have caught it.
Exercise 3 — Benchmark and stack (10 min)
Open the Benchmark view in the Playground (Lakera vs Azure Content Safety vs LLM Guard). Answer:
- Which attack category has the highest detection rate across all three? The lowest?
- If you could only deploy two of the three vendors, which combination maximises coverage? Why?
- What would you scan on the output side that input scanners can't catch?
Think Deeper
You successfully jailbroke the LLM using a role-play scenario. The guardrails didn't catch it. What does this tell you about defense strategy?