Step 3: Retrieval

Find the most relevant chunks for any query

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

From Chunks to a Vector Index

After chunking, you encode each chunk into a vector using a sentence embedding model. All chunk vectors together form a vector index -- an embedding matrix you search against at query time:

Step	Input	Output	When
1. Chunk	Raw documents	List of N text chunks	Once (offline)
2. Encode	N text chunks	Embedding matrix (N x 384)	Once (offline)
3. Query	User question	Query vector (1 x 384)	Every search
4. Rank	Query vector vs embedding matrix	Cosine similarity scores (N,)	Every search
5. Return	Top-k scores	k most relevant chunks	Every search

Steps 1-2 are done once. Steps 3-5 run in milliseconds because the expensive encoding is already done.

Building the Vector Index

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

# Assume chunks is a list of strings from step 1
chunks = [
    "Mimikatz can extract plaintext passwords from LSASS process memory...",
    "To detect LSASS dumping, monitor for process access events (Sysmon ID 10)...",
    "SSH brute force attacks generate many failed authentication events...",
    "DNS tunnelling encodes exfiltrated data in DNS query subdomains...",
    "Ransomware typically encrypts files using AES-256 and stores the key...",
]

# Encode all chunks (done once, offline)
chunk_embeddings = model.encode(chunks)
print(f"Index shape: {chunk_embeddings.shape}")  # (5, 384)

Retrieving Relevant Chunks

def retrieve(query, chunk_embeddings, chunks, k=3):
    """Retrieve the top-k most relevant chunks for a query."""
    query_vec = model.encode([query])
    scores = cosine_similarity(query_vec, chunk_embeddings)[0]
    top_k_indices = np.argsort(scores)[::-1][:k]

    results = []
    for idx in top_k_indices:
        results.append({
            "chunk": chunks[idx],
            "score": float(scores[idx]),
            "index": int(idx),
        })
    return results

# Test it
results = retrieve("how to detect credential dumping", chunk_embeddings, chunks)
for r in results:
    print(f"[{r['score']:.3f}] {r['chunk'][:80]}...")

Evaluating Retrieval Quality

Before connecting retrieval to an LLM, verify that the right chunks come back for your expected queries:

Test query	Expected top chunk	Pass?
"how to detect credential dumping"	LSASS dumping detection (Sysmon ID 10)	Check top-1
"what is DNS tunnelling"	DNS tunnelling encodes data in subdomains	Check top-1
"ransomware encryption method"	Ransomware typically encrypts with AES-256	Check top-1

If the correct chunk does not appear in the top-3 results, the problem is in your chunking strategy or embedding model -- not in the LLM. Fix retrieval before adding generation.

Think Deeper

Try this:

Your security knowledge base has a chunk about 'SSH brute force detection' and another about 'SSH key rotation best practices'. A user queries 'how to secure SSH'. Which chunk ranks higher? Is this the right behaviour for an incident responder vs a compliance auditor?

Both chunks are semantically close to the query, but the model might rank brute force detection higher because 'secure' and 'detection' share a protective-action semantic cluster. For an incident responder, this is correct -- they want detection rules. For a compliance auditor, key rotation is more relevant. This shows that retrieval alone cannot distinguish user intent. Production RAG systems add metadata filters (role, document type) or use a re-ranker to personalise results.

Cybersecurity tie-in: Retrieval is the most critical component of a RAG security assistant. If the wrong chunk is retrieved, the LLM will confidently generate an answer based on irrelevant context -- potentially giving incorrect remediation advice during an active incident. Always evaluate retrieval quality independently before trusting the full pipeline. A wrong answer from a trusted tool is more dangerous than no answer at all.

← Previous ← → to navigate Next →