Step 1: Your First API Call

Understand the request/response structure

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

The Chat API Structure

Every modern LLM API -- Claude, OpenAI, Gemini, Ollama -- uses the same three-parameter structure. Once you learn it, you can use any provider:

ParameterTypePurposeExample
systemStringDefines the model's role and behaviour (invisible to end user)"You are a SOC analyst. Be concise."
messagesList of dictsThe conversation history: alternating user/assistant turns[{"role": "user", "content": "Analyse this log..."}]
max_tokensIntegerHard cap on response length (100 tokens ~ 75 words)200

The system prompt shapes every response. The messages list carries context. The max_tokens cap controls cost and latency.

Your First API Call

from llm_client import get_client

provider, client = get_client()   # auto-detects your API key
print(f"Using provider: {provider}")

response = client.chat(
    system="You are a cybersecurity analyst. Be concise and technical.",
    messages=[
        {"role": "user", "content": "What is a reverse shell?"}
    ],
    max_tokens=200,
)

print(response)

The llm_client.py helper wraps Claude, OpenAI, Gemini, and Ollama behind a common interface. Set whichever API key you have and the code works identically.

Understanding Tokens and Cost

ProviderInput cost (per 1M tokens)Output cost (per 1M tokens)Free tier?
Claude (Sonnet)$3$15No
OpenAI (GPT-4o)$2.50$10No
Gemini (Flash)Free tierFree tierYes
Ollama (local)FreeFreeN/A -- runs locally

A 1,000-word security report analysis costs approximately $0.001. For development and learning, the cost is negligible. Ollama runs entirely on your machine with no internet required after the initial model download.

Request/Response Flow

Understanding the data flow helps you debug issues and optimise performance:

StepWhat happensWhere
1. Build requestAssemble system + messages + max_tokensYour Python code
2. Send HTTPSRequest travels to provider API (or localhost for Ollama)Network
3. TokeniseProvider converts text to token IDsProvider server
4. GenerateModel predicts tokens one at a timeProvider GPU
5. ReturnResponse string sent backNetwork

Typical latency: 1-5 seconds for moderate-length responses. Most of the time is spent in step 4 (generation).

Loading...
Loading...
Loading...

Think Deeper

Make an API call with max_tokens=10, then the same prompt with max_tokens=500. How does the response change? What happens to the cost?

With max_tokens=10, the response is cut off mid-sentence -- the model stops generating even if its answer is incomplete. With max_tokens=500, you get a complete answer but pay for more output tokens. In a security pipeline processing thousands of alerts per hour, this difference matters: token limits are a cost and latency control. Set them as low as possible while still getting usable output.
Cybersecurity tie-in: API keys are credentials. Treat them like passwords: store in environment variables, never commit to git, rotate periodically. A leaked LLM API key lets an attacker run up your bill or, worse, read your prompts if the provider logs them. For sensitive security analysis (malware samples, IR reports), consider Ollama -- data never leaves your machine.

Loading...