← All terms

Retrieval-Augmented Generation (RAG)

A technique that fetches relevant documents and feeds them to the model so its answer is grounded in your data, not just its training.

Retrieval-augmented generation, or RAG, solves a basic problem: a model only knows what it was trained on, and that knowledge is frozen and generic. RAG fixes this by retrieving relevant text from your own documents at query time and including it in the prompt, so the answer is grounded in your data. A RAG pipeline has a few stages: split documents into chunks, turn each chunk into an embedding, store them in a vector index, and at query time find the chunks closest to the question and pass them to the model. The quality of an answer depends heavily on retrieving the right chunks, which is why retrieval quality matters more than the model in many RAG systems. This is the home turf of LlamaIndex, which began as a retrieval library. Agent frameworks add RAG as one tool among many: an agent can decide when to retrieve, what to search for, and how to use the result. Combining RAG with an agent loop lets the system reason about what it needs to look up rather than retrieving blindly on every turn.