RAG: Definition and Examples
RAG (Retrieval-Augmented Generation) is a technique that enriches language model responses by providing it with information extracted from external sources before generating its answer.
Full definition
RAG, or Retrieval-Augmented Generation, is an architecture that combines two key steps: retrieving relevant information from a knowledge base, then generating a response by an LLM using that retrieved information. This approach overcomes fundamental limitations of language models, especially their knowledge cutoff date and tendency to hallucinate.
Concretely, when a user asks a question, the RAG system first converts the question into a vector (embedding), then searches for the most similar passages in a vector database. These relevant passages are then injected into the prompt sent to the LLM, which can then formulate a response based on factual and up-to-date data.
RAG has become one of the most popular architectures in applied AI because it offers an excellent compromise between performance and cost. Rather than fine-tuning an entire model on specific data (expensive and rigid), RAG allows updating the knowledge base independently of the model. It is the preferred approach for enterprise chatbots, document assistants, and question-answering systems on specialized corpora.
The quality of a RAG system depends heavily on the relevance of the retrieval phase. Poor document chunking, embeddings ill-suited to the domain, or overly simplistic search strategies can significantly degrade results. That is why advanced techniques like re-ranking, semantic chunking, or multi-step RAG have emerged to improve the accuracy of these systems.
Etymology
The acronym RAG was introduced by Patrick Lewis et al. in their research paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" published by Facebook AI Research (FAIR) in 2020. The term combines "Retrieval" (information retrieval, a historical discipline in computer science) and "Augmented Generation," reflecting the fusion of classic document search with neural network text generation.
Concrete examples
Enterprise document assistant
Based solely on the documents provided below, answer the user's question. If the information is not present in the documents, state that clearly.
[Retrieved Documents]
{context}
Question: {question}
Customer support chatbot with knowledge base
You are a support agent for [COMPANY]. Use the following help articles to respond to the customer. Cite relevant article numbers in your response.
Relevant articles:
{retrieved_articles}
Customer request: {query}
Augmented legal research
Analyze the following legal excerpts and synthesize the legal principles applicable to the described situation. Cite each source in square brackets.
Excerpts:
{legal_passages}
Situation: {case_description}
Practical usage
In prompt engineering, RAG translates into designing prompts that dynamically incorporate retrieved document context. The main challenge is structuring the prompt so the model relies on the provided documents rather than its internal knowledge, using explicit instructions such as "base your answer solely on the following documents." It is also crucial to handle cases where no relevant document is found, by instructing the model to indicate the absence of information rather than inventing an answer.
Related concepts
FAQ
What is the difference between RAG and fine-tuning?
How can I improve the quality of a RAG system?
Can RAG completely eliminate hallucinations?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Reasoning Model: Definition and Examples
A reasoning model is a language model designed to break down a problem into intermediate reasoning steps before producing its final answer, improving its ability to solve complex tasks.
Responsible AI: Definition and Examples
Responsible AI refers to a set of principles and practices aimed at designing, developing and deploying artificial intelligence systems in a manner that is ethical, transparent and respectful of human rights.
Retrieval: Definition and Examples
Retrieval refers to the process by which an AI system searches for relevant information in a database or document corpus
Rotary Position Embedding: Definition and Examples
Rotary Position Embedding (RoPE) is a positional encoding technique that incorporates token position information into a Transformer model by applying
Runway ML: Definition and Examples
Runway ML is a generative AI platform specialized in creating and editing visual content (video, image, 3D) from text prompts or multimodal inputs.
Scaling Laws: Definition and Examples
Scaling laws are mathematical relationships that describe how AI model performance improves predictably as model size, training data, or compute increases.
Get new prompts every week
Join our newsletter.