P

Prompt Compression: Definition and Examples

Technique for reducing the length of a prompt while preserving its meaning and effectiveness, to optimize token usage and improve language model performance.

Full definition

Prompt compression refers to the set of methods for shortening a prompt sent to a language model without significantly altering the quality of the responses obtained. Faced with the context limitations of LLMs and the increasing cost related to the number of tokens processed, this technique has become a major challenge for prompt engineering practitioners.

There are several compression approaches. Manual compression involves rewording instructions more concisely: removing redundancies, using abbreviations understandable by the model, or restructuring information as lists or tables rather than long paragraphs. Algorithmic compression, on the other hand, uses specialized tools (such as LLMLingua or AutoCompressors) that automatically identify and eliminate the least informative tokens in a prompt.

The benefit of prompt compression goes beyond simple cost reduction. A shorter prompt can paradoxically improve response quality by reducing informational noise and allowing the model to focus on essential elements. This is especially true for RAG (Retrieval-Augmented Generation) tasks where long retrieved documents can dilute the main instructions.

However, compression carries risks: too aggressive compression may remove important nuances, key examples, or subtle constraints that guided the model. The art of prompt compression therefore lies in balancing conciseness and preservation of meaning, a trade-off highly dependent on the model used and the complexity of the task.

Etymology

The term combines 'prompt' (instruction given to an AI model) and 'compression' (from Latin compressio, action of pressing together). The concept emerged in 2023 with the democratization of LLMs with limited context windows, notably through research on LLMLingua (Microsoft Research) which formalized the algorithmic approach to prompt compression.

Concrete examples

Manual compression of a verbose prompt for a classification task

Before: "I would like you to analyze the following text and tell me which category it falls into among the following categories: positive, negative or neutral. Here is the text to analyze: {TEXT}"
After: "Classify this text (positive/negative/neutral): {TEXT}"

RAG context compression by removing irrelevant passages before injection into the prompt

Instead of injecting 10 full retrieved documents, extract only the 2-3 most relevant passages and insert them in condensed form: "Context:

  • [Source 1]: key point summary
  • [Source 2]: key point summary

Question: {QUESTION}"

Using structured formats to compress complex instructions

Role: SEO writer
Task: 800-word article
Subject: {SUBJECT}
Constraints: expert tone | H2/H3 structure | 3 examples | final CTA
Format: markdown

Practical usage

To apply prompt compression in daily use, start by eliminating any polite phrases and redundancies in your prompts. Prefer structured formats (lists, tables, shorthand notation) over long sentences. For production use cases with large context volumes, consider automatic compression tools like LLMLingua which can reduce prompts by 50 to 80% with minimal performance loss.

Related concepts

TokenContext WindowRAG (Retrieval-Augmented Generation)Few-shot prompting

FAQ

Does prompt compression degrade response quality?
Not necessarily. Moderate compression (20-50%) often improves results by reducing noise. Studies show that beyond 60-70% compression, quality may start to degrade depending on task complexity. The key is to preserve key instructions, critical examples, and important constraints.
What tools can automatically compress prompts?
Several tools exist: LLMLingua and LongLLMLingua (Microsoft Research) use a small language model to identify removable tokens. AutoCompressors train models to summarize context into compact vectors. Libraries like Selective Context or RECOMP also offer targeted compression approaches for RAG.
What is the difference between prompt compression and prompt optimization?
Prompt compression specifically focuses on reducing prompt length (fewer tokens). Prompt optimization is a broader concept aimed at improving overall prompt effectiveness, which may include compression but also rewording, reorganization, adding relevant examples, or changing strategy (chain-of-thought, few-shot, etc.).

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.