RAG Pipeline

The end-to-end architecture for retrieval-augmented generation, from query through retrieval to final LLM response.

A RAG pipeline is the complete flow that turns a user question into a grounded answer. It involves chunking source documents, embedding them into vectors, storing them in a vector database, retrieving relevant chunks at query time, and feeding them to an LLM for generation.

A typical pipeline has five stages: 1) ingest and chunk documents, 2) generate embeddings, 3) store in a vector DB, 4) retrieve top-k chunks for a query, 5) prompt an LLM with the retrieved context.

Key tuning points: chunk size, embedding model, retrieval strategy (dense, sparse, hybrid), reranking, and prompt formatting all affect output quality.

Building a good RAG pipeline is more art than science. Many production systems add reranking, metadata filtering, query rewriting, and hybrid search to improve retrieval quality.

Related Terms

← Back to Glossary