RAG — Retrieval-Augmented Generation
The Open-Book Exam
8 min read
The Open-Book Exam
Imagine a student who isn't expected to memorise everything — they have access to the library during the exam.
Instead of relying only on memory, the student searches relevant books for each question, reads the most relevant passage, and writes a well-informed answer. RAG gives AI models the same superpower — search relevant documents first, then answer using what you just retrieved. The AI doesn't have to have memorised the answer; it can look it up.
In Plain English
RAG is a technique where an AI first searches a knowledge base for relevant information, then uses that retrieved information to generate a more accurate, up-to-date answer — rather than relying purely on its training memory.
The Technical Picture
RAG combines a retrieval component (dense vector search using embeddings over a document corpus) with a generative LLM. Retrieved chunks are injected into the prompt context, enabling the model to ground its output in specific, verifiable source documents.
Real-World Examples
- Perplexity searches the web before generating every answer
- Enterprise chatbots using RAG over internal company documents
- NotebookLM by Google lets you RAG over your own uploaded PDFs
RAG = search first, then answer. It gives AI real-time access to external knowledge beyond its training.