Retrieval-Augmented Generation (RAG) is an AI framework that enhances the performance of large language models (LLMs) by integrating them with external information retrieval systems. This approach allows LLMs to generate more accurate, up-to-date, and contextually relevant responses by referencing authoritative data sources beyond their original training data.
RAG operates through a multi-step process: (1) Indexing: External data—such as documents, databases, or web pages—is converted into embeddings (numerical vector representations) and stored in a vector database for efficient retrieval. (2) Retrieval: When a user submits a query, a retrieval mechanism searches the indexed data to find the most relevant documents or information snippets. (3) Augmentation: The retrieved information is combined (augmented) with the user’s query and provided as additional context to the LLM. (4) Generation: The LLM uses both its internal knowledge and the newly retrieved data to generate a response that is more accurate and grounded in up-to-date or domain-specific information.