P

RAG: Definition and Examples

RAG (Retrieval-Augmented Generation) is a technique that enriches language model responses by providing it with information extracted from external sources before generating its answer.

Full definition

RAG, or Retrieval-Augmented Generation, is an architecture that combines two key steps: retrieving relevant information from a knowledge base, then generating a response by an LLM using that retrieved information. This approach overcomes fundamental limitations of language models, especially their knowledge cutoff date and tendency to hallucinate.

Concretely, when a user asks a question, the RAG system first converts the question into a vector (embedding), then searches for the most similar passages in a vector database. These relevant passages are then injected into the prompt sent to the LLM, which can then formulate a response based on factual and up-to-date data.

RAG has become one of the most popular architectures in applied AI because it offers an excellent compromise between performance and cost. Rather than fine-tuning an entire model on specific data (expensive and rigid), RAG allows updating the knowledge base independently of the model. It is the preferred approach for enterprise chatbots, document assistants, and question-answering systems on specialized corpora.

The quality of a RAG system depends heavily on the relevance of the retrieval phase. Poor document chunking, embeddings ill-suited to the domain, or overly simplistic search strategies can significantly degrade results. That is why advanced techniques like re-ranking, semantic chunking, or multi-step RAG have emerged to improve the accuracy of these systems.

Etymology

The acronym RAG was introduced by Patrick Lewis et al. in their research paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" published by Facebook AI Research (FAIR) in 2020. The term combines "Retrieval" (information retrieval, a historical discipline in computer science) and "Augmented Generation," reflecting the fusion of classic document search with neural network text generation.

Concrete examples

Enterprise document assistant

Based solely on the documents provided below, answer the user's question. If the information is not present in the documents, state that clearly.

[Retrieved Documents]
{context}

Question: {question}

Customer support chatbot with knowledge base

You are a support agent for [COMPANY]. Use the following help articles to respond to the customer. Cite relevant article numbers in your response.

Relevant articles:
{retrieved_articles}

Customer request: {query}

Augmented legal research

Analyze the following legal excerpts and synthesize the legal principles applicable to the described situation. Cite each source in square brackets.

Excerpts:
{legal_passages}

Situation: {case_description}

Practical usage

In prompt engineering, RAG translates into designing prompts that dynamically incorporate retrieved document context. The main challenge is structuring the prompt so the model relies on the provided documents rather than its internal knowledge, using explicit instructions such as "base your answer solely on the following documents." It is also crucial to handle cases where no relevant document is found, by instructing the model to indicate the absence of information rather than inventing an answer.

Related concepts

EmbeddingsVector databaseChunkingFine-tuning

FAQ

What is the difference between RAG and fine-tuning?
Fine-tuning modifies model weights by training on specific data, which is costly and requires retraining each time data is updated. RAG injects relevant information at query time without modifying the model. RAG is preferable when data changes frequently or when source traceability is needed. Fine-tuning is more suited for modifying the model's style or deep behavior.
How can I improve the quality of a RAG system?
Several levers exist: optimize document chunking using semantically coherent segments rather than fixed chunk sizes, choose an embedding model suited to your domain and language, add a re-ranking step to reorder results by relevance, and enrich document metadata to enable hybrid filtering (vector + keyword). Testing and iterating on these parameters with reference question-answer sets is essential.
Can RAG completely eliminate hallucinations?
No, RAG significantly reduces hallucinations but does not eliminate them entirely. The model may still misinterpret a retrieved passage, incorrectly merge information from multiple sources, or generate unfounded extrapolations. To minimize this risk, it is recommended to explicitly ask the model to cite its sources and only answer if the information is present in the provided context.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.