RAG: Definition and Examples

RAG (Retrieval-Augmented Generation) is a technique that enriches language model responses by providing it with information extracted from external sources before generating its answer.

Full definition

RAG, or Retrieval-Augmented Generation, is an architecture that combines two key steps: retrieving relevant information from a knowledge base, then generating a response by an LLM using that retrieved information. This approach overcomes fundamental limitations of language models, especially their knowledge cutoff date and tendency to hallucinate.

Concretely, when a user asks a question, the RAG system first converts the question into a vector (embedding), then searches for the most similar passages in a vector database. These relevant passages are then injected into the prompt sent to the LLM, which can then formulate a response based on factual and up-to-date data.

RAG has become one of the most popular architectures in applied AI because it offers an excellent compromise between performance and cost. Rather than fine-tuning an entire model on specific data (expensive and rigid), RAG allows updating the knowledge base independently of the model. It is the preferred approach for enterprise chatbots, document assistants, and question-answering systems on specialized corpora.

The quality of a RAG system depends heavily on the relevance of the retrieval phase. Poor document chunking, embeddings ill-suited to the domain, or overly simplistic search strategies can significantly degrade results. That is why advanced techniques like re-ranking, semantic chunking, or multi-step RAG have emerged to improve the accuracy of these systems.

Etymology

The acronym RAG was introduced by Patrick Lewis et al. in their research paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" published by Facebook AI Research (FAIR) in 2020. The term combines "Retrieval" (information retrieval, a historical discipline in computer science) and "Augmented Generation," reflecting the fusion of classic document search with neural network text generation.

Concrete examples

Enterprise document assistant

Based solely on the documents provided below, answer the user's question. If the information is not present in the documents, state that clearly.

[Retrieved Documents]
{context}

Question: {question}

Customer support chatbot with knowledge base

You are a support agent for [COMPANY]. Use the following help articles to respond to the customer. Cite relevant article numbers in your response.

Relevant articles:
{retrieved_articles}

Customer request: {query}

Augmented legal research

Analyze the following legal excerpts and synthesize the legal principles applicable to the described situation. Cite each source in square brackets.

Excerpts:
{legal_passages}

Situation: {case_description}

Practical usage

In prompt engineering, RAG translates into designing prompts that dynamically incorporate retrieved document context. The main challenge is structuring the prompt so the model relies on the provided documents rather than its internal knowledge, using explicit instructions such as "base your answer solely on the following documents." It is also crucial to handle cases where no relevant document is found, by instructing the model to indicate the absence of information rather than inventing an answer.

Related concepts

EmbeddingsVector databaseChunkingFine-tuning

FAQ

What is the difference between RAG and fine-tuning?

Fine-tuning modifies model weights by training on specific data, which is costly and requires retraining each time data is updated. RAG injects relevant information at query time without modifying the model. RAG is preferable when data changes frequently or when source traceability is needed. Fine-tuning is more suited for modifying the model's style or deep behavior.

How can I improve the quality of a RAG system?

Several levers exist: optimize document chunking using semantically coherent segments rather than fixed chunk sizes, choose an embedding model suited to your domain and language, add a re-ranking step to reorder results by relevance, and enrich document metadata to enable hybrid filtering (vector + keyword). Testing and iterating on these parameters with reference question-answer sets is essential.

Can RAG completely eliminate hallucinations?

No, RAG significantly reduces hallucinations but does not eliminate them entirely. The model may still misinterpret a retrieved passage, incorrectly merge information from multiple sources, or generate unfounded extrapolations. To minimize this risk, it is recommended to explicitly ask the model to cite its sources and only answer if the information is present in the provided context.

How to use this prompt

Copy the prompt with the button above.
Paste it into ChatGPT, Claude or your favorite AI assistant.
Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

Prompt library Learn prompting Prompt builder Prompt optimizer

More definitions

React Prompting: Definition and Examples

React Prompting (Reasoning + Acting) is a prompt engineering technique that combines step-by-step reasoning with concrete actions, allowing

Reasoning Model: Definition and Examples

A reasoning model is a language model designed to break down a problem into intermediate reasoning steps before producing its final answer, improving its ability to solve complex tasks.

Red Teaming: Definition and Examples

Red teaming is an adversarial evaluation method that systematically tests the limits, flaws, and vulnerabilities of an AI system by simulating attacks or malicious uses.

Reflection: Definition and Examples

Reflection is an AI technique where a language model iteratively evaluates and corrects its own responses, analyzing its errors to produce a more accurate and reliable result.

Regularization: Definition and Examples

Regularization is a set of techniques used in machine learning to prevent overfitting by adding constraints or penalties to the model during training.

Reinforcement Learning: Definition and Examples

Reinforcement Learning is a branch of machine learning where an agent learns to make optimal decisions by interacting with an environment and receiving rewards or penalties.

Get new prompts every week

Join our newsletter.