P

Long Context Model: Definition and Examples

A Long Context Model is a language model capable of processing and reasoning over very large amounts of text in a single interaction, with a context window reaching hundreds of thousands, or even millions, of tokens.

Full definition

A Long Context Model refers to a generative AI model whose context window — i.e., the maximum amount of text it can "see" simultaneously — is significantly extended compared to traditional models. While early LLMs were limited to a few thousand tokens (about 4,000 for GPT-3), current long context models can handle 128,000, 200,000, or even over a million tokens in a single request.

This capability radically transforms possible use cases. A user can submit an entire book, a complete codebase, hours of transcription, or hundreds of documents for the model to analyze, summarize, or answer specific questions based on the entire content. The model no longer needs to fragment information or resort to external retrieval systems to access relevant data.

Technical advances that make this possible include optimized attention architectures (such as sparse attention or sliding window attention), relative token positioning techniques (RoPE, ALiBi), and hardware optimizations. Models like Claude (up to 200K tokens), Gemini (up to 2M tokens), or GPT-4o (128K tokens) illustrate this trend.

For prompt engineering, long context opens up novel strategies: providing massive few-shot examples, including all reference documentation directly in the prompt, or requesting cross-analysis of multiple sources without an external retrieval pipeline. However, a longer context does not guarantee better attention: strategic placement of key information remains crucial for obtaining accurate responses.

Etymology

The term combines "long context", which refers to the size of the context window measured in tokens, and "model", denoting a language model. The expression spread from 2023-2024 when publishers began marketing their models by highlighting the size of their context window as a major competitive advantage.

Concrete examples

Analysis of a large legal document

Here is the full 80-page contract between parties A and B. Identify all clauses that mention financial penalties, summarize each one, and flag any inconsistencies between these clauses.

Code review of an entire project

I provide you with the complete source code of my application (45 files). Analyze the overall architecture, identify potential security issues, and propose improvements while respecting the patterns already used in the project.

Multi-source synthesis for research

Here are 12 scientific articles on the impact of sleep on memory. Compare their methodologies, identify consensus and contradictions, then write a structured synthesis with appropriate references.

Practical usage

In prompt engineering, a long context model allows you to include all necessary documentation, examples, and data directly in the prompt, reducing the need for external RAG systems. To maximize response quality, place the most important information at the beginning and end of the prompt (primacy and recency effects), and use explicit instructions to guide the model toward the relevant sections of the provided context.

Related concepts

Context WindowTokenRetrieval-Augmented Generation (RAG)Needle in a Haystack Test

FAQ

Does a longer context mean the model understands better?
Not necessarily. A long context model can technically access more information, but its attention capacity is not uniform across the entire text. Studies show that information located in the middle of a very long context is sometimes less well exploited than that at the beginning or the end (a phenomenon called "lost in the middle"). Prompt quality and strategic placement of information remain decisive.
What is the difference between a Long Context Model and RAG?
RAG (Retrieval-Augmented Generation) dynamically retrieves relevant fragments from an external database before injecting them into the prompt. A Long Context Model allows you to directly load a large amount of data without a retrieval step. The two approaches are complementary: RAG remains relevant for corpora exceeding the model's context window, while long context simplifies cases where all data fits in a single request.
Does using all available context cost more?
Yes, in most commercial APIs, the cost is proportional to the number of tokens processed (input and output). Sending 200,000 tokens costs significantly more than sending 2,000. It is therefore recommended to assess whether including all the content is truly necessary or if prior filtering (via RAG or manual selection) could achieve an equivalent result at a lower cost.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.