Chunking: Definition and Examples

Chunking is a technique that consists of breaking down a text, task, or data into smaller, coherent segments to facilitate their processing by an AI model or improve response quality.

Full definition

Chunking, or segmentation, is a fundamental strategy in prompt engineering and AI information processing. It consists of dividing large content or a complex task into smaller, more digestible, and easier-to-process pieces ("chunks"). This approach is directly inspired by cognitive science, where chunking refers to the human brain's ability to group information into meaningful units to better memorize and manipulate it.

In the context of large language models (LLMs), chunking applies at two distinct levels. First, at the data level: when feeding a RAG (Retrieval-Augmented Generation) system, source documents are split into optimally sized segments to be indexed and retrieved efficiently. The size and overlap of these segments directly influence the relevance of results. Second, at the prompt level: facing a complex task, one decomposes it into sequential subtasks that the model processes one by one.

Chunking at the prompting level helps circumvent several LLM limitations: the limited context window size, the tendency to lose precision on long tasks, and the difficulty maintaining coherence over multiple instructions. By processing each segment independently or sequentially, one obtains more accurate and better-structured responses.

The quality of chunking depends on several factors: the chosen granularity (too fine, context is lost; too broad, precision is lost), the semantic coherence of each segment, and the overlap strategy between adjacent segments. Mastering chunking is essential for anyone working with LLMs on non-trivial tasks.

Etymology

The term "chunking" comes from the English word "chunk" (piece, block). It was popularized in cognitive psychology by George A. Miller in his 1956 article "The Magical Number Seven, Plus or Minus Two," where he describes the capacity of human working memory to process about seven units of information simultaneously. The concept was later adopted in computer science and then artificial intelligence to refer to any form of structured segmentation of information.

Concrete examples

Breaking down a complex analysis task into sequential steps

Analyze this contract in 3 distinct steps. Step 1: identify the stakeholders and their obligations. Step 2: list the termination clauses. Step 3: assess potential legal risks.

Preparing documents for a RAG system

Split this 50-page document into segments of 500 tokens with an overlap of 50 tokens. Each segment must start with a coherent section title or paragraph beginning.

Summarizing a long text by processing each section independently

I will provide you with an article in 5 parts. For each part, generate a summary of maximum 3 sentences. Once all 5 summaries are produced, synthesize them into a coherent overall summary.

Practical usage

In practice, use chunking whenever a task exceeds a few paragraphs or involves multiple logical steps. Decompose your complex prompts into numbered instructions and process each segment separately before requesting a synthesis. For RAG systems, experiment with chunk sizes between 200 and 1000 tokens and adjust the overlap (10-20%) to preserve context between segments.

Related concepts

RAG (Retrieval-Augmented Generation)Chain of ThoughtContext WindowEmbedding

FAQ

What is the ideal chunk size for a RAG system?

There is no universal size. In general, chunks of 300 to 800 tokens offer a good balance between precision and context. Chunks that are too small lose semantic context, while chunks that are too large dilute relevant information. The optimal size depends on the type of content and use case: short FAQs require smaller chunks, while complex technical documents benefit from larger segments.

What is the difference between chunking and Chain of Thought?

Chunking consists of splitting data or a task into independent segments, while Chain of Thought guides the model to reason step by step on the same problem. Both techniques are complementary: you can chunk a task into sub-problems, then apply Chain of Thought to each sub-problem for deeper reasoning.

Is chunking useful even with large context window models?

Yes, absolutely. Even models with context windows of 100,000 tokens or more tend to lose attention and precision on passages in the middle of long texts (the so-called "lost in the middle" phenomenon). Chunking helps maintain response quality by focusing the model's attention on targeted segments, regardless of the theoretical capacity of the context window.

How to use this prompt

Copy the prompt with the button above.
Paste it into ChatGPT, Claude or your favorite AI assistant.
Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

Prompt library Learn prompting Prompt builder Prompt optimizer

More definitions

Claude 3: Definition and Examples

Claude 3 is a family of language models developed by Anthropic, launched in March 2024, comprising three variants (Haiku, Sonnet, and Opus) offering different

Code Completion: Definition and Examples

Code completion is an AI-powered feature that automatically suggests code as the developer types, predicting lines, functions

Code Generation: Definition and Examples

Code generation enables producing source code from natural language instructions. Discover how ChatGPT, Claude, and Copilot write code.

Codex (OpenAI): Definition and Use Cases

Codex is OpenAI's autonomous coding agent. Understand how it works, its differences from Claude Code and Cursor, and when to use it.

Command R: Definition and Examples

Command R is a family of large language models developed by Cohere, specifically optimized for professional use cases such as retrieval-augmented generation

Completion: Definition and Examples

Response generated by a language model (LLM) from a given prompt. Completion is the text produced by the AI to complete, answer, or extend the user's input.

Get new prompts every week

Join our newsletter.