Step 1: Your First API Call

Understand the request/response structure

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

The Chat API Structure

Every modern LLM API -- Claude, OpenAI, Gemini, Ollama -- uses the same three-parameter structure. Once you learn it, you can use any provider:

Parameter	Type	Purpose	Example
`system`	String	Defines the model's role and behaviour (invisible to end user)	"You are a SOC analyst. Be concise."
`messages`	List of dicts	The conversation history: alternating user/assistant turns	[{"role": "user", "content": "Analyse this log..."}]
`max_tokens`	Integer	Hard cap on response length (100 tokens ~ 75 words)	200

The system prompt shapes every response. The messages list carries context. The max_tokens cap controls cost and latency.

Your First API Call

from llm_client import get_client

provider, client = get_client()   # auto-detects your API key
print(f"Using provider: {provider}")

response = client.chat(
    system="You are a cybersecurity analyst. Be concise and technical.",
    messages=[
        {"role": "user", "content": "What is a reverse shell?"}
    ],
    max_tokens=200,
)

print(response)

The llm_client.py helper wraps Claude, OpenAI, Gemini, and Ollama behind a common interface. Set whichever API key you have and the code works identically.

Understanding Tokens and Cost

Provider	Input cost (per 1M tokens)	Output cost (per 1M tokens)	Free tier?
Claude (Sonnet)	$3	$15	No
OpenAI (GPT-4o)	$2.50	$10	No
Gemini (Flash)	Free tier	Free tier	Yes
Ollama (local)	Free	Free	N/A -- runs locally

A 1,000-word security report analysis costs approximately $0.001. For development and learning, the cost is negligible. Ollama runs entirely on your machine with no internet required after the initial model download.

Request/Response Flow

Understanding the data flow helps you debug issues and optimise performance:

Step	What happens	Where
1. Build request	Assemble system + messages + max_tokens	Your Python code
2. Send HTTPS	Request travels to provider API (or localhost for Ollama)	Network
3. Tokenise	Provider converts text to token IDs	Provider server
4. Generate	Model predicts tokens one at a time	Provider GPU
5. Return	Response string sent back	Network

Typical latency: 1-5 seconds for moderate-length responses. Most of the time is spent in step 4 (generation).

Think Deeper

Try this:

Make an API call with max_tokens=10, then the same prompt with max_tokens=500. How does the response change? What happens to the cost?

With max_tokens=10, the response is cut off mid-sentence -- the model stops generating even if its answer is incomplete. With max_tokens=500, you get a complete answer but pay for more output tokens. In a security pipeline processing thousands of alerts per hour, this difference matters: token limits are a cost and latency control. Set them as low as possible while still getting usable output.

Cybersecurity tie-in: API keys are credentials. Treat them like passwords: store in environment variables, never commit to git, rotate periodically. A leaked LLM API key lets an attacker run up your bill or, worse, read your prompts if the provider logs them. For sensitive security analysis (malware samples, IR reports), consider Ollama -- data never leaves your machine.

← Previous ← → to navigate Next →