Prompt Compression: Definition and Examples
Technique for reducing the length of a prompt while preserving its meaning and effectiveness, to optimize token usage and improve language model performance.
Full definition
Prompt compression refers to the set of methods for shortening a prompt sent to a language model without significantly altering the quality of the responses obtained. Faced with the context limitations of LLMs and the increasing cost related to the number of tokens processed, this technique has become a major challenge for prompt engineering practitioners.
There are several compression approaches. Manual compression involves rewording instructions more concisely: removing redundancies, using abbreviations understandable by the model, or restructuring information as lists or tables rather than long paragraphs. Algorithmic compression, on the other hand, uses specialized tools (such as LLMLingua or AutoCompressors) that automatically identify and eliminate the least informative tokens in a prompt.
The benefit of prompt compression goes beyond simple cost reduction. A shorter prompt can paradoxically improve response quality by reducing informational noise and allowing the model to focus on essential elements. This is especially true for RAG (Retrieval-Augmented Generation) tasks where long retrieved documents can dilute the main instructions.
However, compression carries risks: too aggressive compression may remove important nuances, key examples, or subtle constraints that guided the model. The art of prompt compression therefore lies in balancing conciseness and preservation of meaning, a trade-off highly dependent on the model used and the complexity of the task.
Etymology
The term combines 'prompt' (instruction given to an AI model) and 'compression' (from Latin compressio, action of pressing together). The concept emerged in 2023 with the democratization of LLMs with limited context windows, notably through research on LLMLingua (Microsoft Research) which formalized the algorithmic approach to prompt compression.
Concrete examples
Manual compression of a verbose prompt for a classification task
Before: "I would like you to analyze the following text and tell me which category it falls into among the following categories: positive, negative or neutral. Here is the text to analyze: {TEXT}"
After: "Classify this text (positive/negative/neutral): {TEXT}"
RAG context compression by removing irrelevant passages before injection into the prompt
Instead of injecting 10 full retrieved documents, extract only the 2-3 most relevant passages and insert them in condensed form: "Context:
- [Source 1]: key point summary
- [Source 2]: key point summary
Question: {QUESTION}"
Using structured formats to compress complex instructions
Role: SEO writer
Task: 800-word article
Subject: {SUBJECT}
Constraints: expert tone | H2/H3 structure | 3 examples | final CTA
Format: markdown
Practical usage
To apply prompt compression in daily use, start by eliminating any polite phrases and redundancies in your prompts. Prefer structured formats (lists, tables, shorthand notation) over long sentences. For production use cases with large context volumes, consider automatic compression tools like LLMLingua which can reduce prompts by 50 to 80% with minimal performance loss.
Related concepts
FAQ
Does prompt compression degrade response quality?
What tools can automatically compress prompts?
What is the difference between prompt compression and prompt optimization?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Prompt Decomposition: Definition and Examples
Technique of breaking down a complex task into several simpler and more targeted sub-prompts, in order to obtain more precise and reliable responses from the LLM.
Prompt Engineering: Definition and Examples
Prompt engineering is the art and science of formulating precise and structured instructions to get the best possible results from a generative AI model.
Prompt Injection: Definition and Examples
Attack technique consisting of inserting malicious instructions into a prompt to divert the intended behavior of a language model (LLM) and
Prompt Optimization: Definition and Examples
Iterative process of improving a prompt to maximize the quality, relevance, and consistency of responses generated by a language model.
Prompt Template: Definition and Examples
A prompt template is a pre-designed prompt pattern containing replaceable variables, enabling the generation of structured and reproducible instructions for generative AI.
Pruning: Definition and Examples
Pruning is an optimization technique that involves removing the least important parameters, neurons, or connections from a neural network
Get new prompts every week
Join our newsletter.