intermediate

RAG — Retrieval-Augmented Generation

The Open-Book Exam

8 min read

The Analogy

The Open-Book Exam

Imagine a student who isn't expected to memorise everything — they have access to the library during the exam.

Instead of relying only on memory, the student searches relevant books for each question, reads the most relevant passage, and writes a well-informed answer. RAG gives AI models the same superpower — search relevant documents first, then answer using what you just retrieved. The AI doesn't have to have memorised the answer; it can look it up.

In Plain English

RAG is a technique where an AI first searches a knowledge base for relevant information, then uses that retrieved information to generate a more accurate, up-to-date answer — rather than relying purely on its training memory.

The Technical Picture

RAG combines a retrieval component (dense vector search using embeddings over a document corpus) with a generative LLM. Retrieved chunks are injected into the prompt context, enabling the model to ground its output in specific, verifiable source documents.

Real-World Examples

Perplexity searches the web before generating every answer
Enterprise chatbots using RAG over internal company documents
NotebookLM by Google lets you RAG over your own uploaded PDFs

Key Takeaway

RAG = search first, then answer. It gives AI real-time access to external knowledge beyond its training.

Related Concepts

Embeddings & Vector Search

Knowledge Cutoff

Grounding

Perplexity — RAG-First Search AI

Back to All Concepts