Stateless API, Stateful Conversation
The LLM API is stateless -- every call is independent. The model has zero memory of previous calls. To create a conversation, you must send the full history every time:
| Turn | Messages you send | What the model sees |
|---|---|---|
| Turn 1 | [user: "What is lateral movement?"] | Just the question |
| Turn 2 | [user: "What is lateral movement?", assistant: "Lateral movement is...", user: "How do I detect it?"] | The entire conversation so far |
| Turn 3 | [user, assistant, user, assistant, user: "What tools?"] | All 5 messages -- growing every turn |
The conversation list grows with every exchange. You are responsible for maintaining this state.
Building a Multi-Turn Conversation
system = "You are a threat intelligence analyst. Be concise and technical."
messages = []
def chat(user_input):
"""Send a message and get a response, maintaining history."""
messages.append({"role": "user", "content": user_input})
response = client.chat(
system=system,
messages=messages,
max_tokens=300,
)
messages.append({"role": "assistant", "content": response})
return response
# Turn 1
print(chat("What is lateral movement?"))
# Turn 2 -- model remembers the context from turn 1
print(chat("How do I detect it in a Windows environment?"))
# Turn 3 -- model has full conversation history
print(chat("What Sigma rules should I deploy?"))
Context Window and Token Limits
As conversations grow, so does the token count. Every model has a maximum context window:
| Model | Context window | Approximate word limit |
|---|---|---|
| Claude Sonnet | 200K tokens | ~150,000 words |
| GPT-4o | 128K tokens | ~96,000 words |
| Gemini Flash | 1M tokens | ~750,000 words |
| Ollama (small models) | 4K-32K tokens | ~3,000-24,000 words |
When the conversation exceeds the context window, you must truncate older messages. Common strategies: drop the oldest turns, summarise the conversation so far, or keep only the system prompt and last N turns.
Building an Interactive Security Assistant
system = """You are a security incident response assistant.
When given an incident description, help the analyst through triage:
1. Classify the incident type
2. Suggest containment actions
3. Identify evidence to preserve
4. Recommend escalation criteria
Maintain context across the conversation."""
messages = []
print("Security Assistant (type 'quit' to exit)")
print("-" * 45)
while True:
user_input = input("\nYou: ").strip()
if user_input.lower() == "quit":
break
messages.append({"role": "user", "content": user_input})
response = client.chat(system=system, messages=messages, max_tokens=400)
messages.append({"role": "assistant", "content": response})
print(f"\nAssistant: {response}")
This is the skeleton of every AI-powered security tool: a system prompt that defines the role, a message loop that maintains context, and an LLM that generates responses.
Think Deeper
Start a conversation about a suspicious IP. After 3 turns, ask 'what IP were we discussing?' Does the model remember? Now start a new conversation and ask the same question without history. What happens?