Step 3: Structured JSON Output

Machine-readable output for pipeline integration

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Why Structured Output?

When LLM output feeds into another system (SIEM, ticketing, firewall rules), you need machine-readable data -- not prose. Compare:

Output typeExampleParseable?
Prose "The log entry indicates a brute-force attack. Severity is high because there were 198 failed attempts..." No -- requires NLP to extract fields
JSON {"threat_type": "brute_force", "severity": "High", "technique_id": "T1110"} Yes -- json.loads() and done

The JSON version can be consumed directly by a SIEM rule engine, a ticketing API, or an automated response playbook.

Instructing the Model to Return JSON

system = """You are a security log classifier.
Respond with valid JSON only. No prose, no markdown fences, no explanation.

Output schema:
{
  "threat_type": "string (e.g. brute_force, lateral_movement, exfiltration)",
  "technique_id": "string (MITRE ATT&CK ID, e.g. T1110)",
  "severity": "Critical | High | Medium | Low",
  "confidence": "float 0.0-1.0",
  "recommended_actions": ["list of strings"]
}"""

response = client.chat(
    system=system,
    messages=[{"role": "user", "content": "198 failed SSH logins from 45.33.32.156 in 60 seconds"}],
    max_tokens=300,
)

Parsing and Validating the Response

import json

def parse_classification(response_text):
    """Parse and validate LLM JSON output."""
    try:
        data = json.loads(response_text)
    except json.JSONDecodeError:
        return {"error": "Model returned invalid JSON", "raw": response_text}

    # Validate required fields
    required = ["threat_type", "severity", "confidence"]
    missing = [f for f in required if f not in data]
    if missing:
        return {"error": f"Missing fields: {missing}", "raw": response_text}

    # Validate severity value
    valid_severities = {"Critical", "High", "Medium", "Low"}
    if data["severity"] not in valid_severities:
        data["severity"] = "Unknown"

    # Validate confidence range
    if not (0.0 <= data.get("confidence", 0) <= 1.0):
        data["confidence"] = 0.0

    return data

result = parse_classification(response)
print(json.dumps(result, indent=2))

Never trust raw LLM output. Always validate structure, field presence, and value ranges before passing to downstream systems.

Common Failure Modes

FailureWhat happensDefence
Markdown wrappingModel returns ```json ... ``` instead of raw JSONStrip markdown fences before parsing; add "no markdown" to system prompt
Extra proseModel adds "Here is the JSON:" before the output"No prose, no explanation" in system prompt; extract JSON with regex fallback
Hallucinated fieldsModel invents fields not in your schemaOnly extract the fields you expect; ignore extras
Invalid valuesSeverity = "Very High" instead of "High"Validate against allowed values; map to nearest valid value
Loading...
Loading...
Loading...

Think Deeper

Send a log entry and ask for JSON output. Then send a malformed log entry (random garbage text). Does the model still return valid JSON? What are the implications for an automated pipeline?

On garbage input, the model may return JSON with low-confidence values, or it may hallucinate plausible-looking classifications. This is dangerous in a security pipeline: the model always returns something, even for nonsensical input. Production systems must validate JSON structure with json.loads() and also validate that field values are within expected ranges. Never trust raw LLM output without validation.
Cybersecurity tie-in: Structured JSON output is how you integrate LLMs into automated security pipelines -- SOAR playbooks, SIEM enrichment, automated ticketing. But the model always returns something, even for garbage input. A malformed log entry might produce valid-looking JSON with hallucinated classifications. Always validate output before it triggers automated actions like IP blocking or alert escalation.

Loading...