RAG Pipeline
The end-to-end architecture for retrieval-augmented generation, from query through retrieval to final LLM response.
A RAG pipeline is the complete flow that turns a user question into a grounded answer. It involves chunking source documents, embedding them into vectors, storing them in a vector database, retrieving relevant chunks at query time, and feeding them to an LLM for generation.
A typical pipeline has five stages: 1) ingest and chunk documents, 2) generate embeddings, 3) store in a vector DB, 4) retrieve top-k chunks for a query, 5) prompt an LLM with the retrieved context.
Building a good RAG pipeline is more art than science. Many production systems add reranking, metadata filtering, query rewriting, and hybrid search to improve retrieval quality.