Attention: Definition and Examples
Fundamental mechanism of modern language models that allows the model to weight the relative importance of each word with respect to others in a sequence, in order to better understand context and semantic relationships.
Full definition
Attention is a mechanism introduced in the seminal paper 'Attention Is All You Need' (2017) by the Google team. It is the fundamental building block of the Transformer architecture, on which all current large language models such as GPT, Claude, or Gemini are based. The principle: rather than processing words of a sentence sequentially and uniformly, the model learns to 'look at' all words simultaneously and assign different weights to each according to its relevance to the task at hand.
Concretely, the attention mechanism works with three vectors — Query, Key, and Value — computed for each token in the sequence. The attention score between two tokens is obtained by the dot product of their Query and Key vectors, then normalized. This score determines how much one word should 'pay attention' to another. For example, in the sentence 'The cat sleeps on the couch', the word 'sleeps' will assign a high weight to 'cat' because it is the subject of the action.
The most widely used variant is 'self-attention', where each token computes its attention scores with respect to all other tokens in the same sequence. Transformers also use 'multi-head attention', which runs several attention mechanisms in parallel, allowing the model to capture different types of relationships (syntactic, semantic, logical) simultaneously.
In prompt engineering, understanding attention is crucial because it explains why the position and wording of instructions in a prompt directly influence the quality of responses. Information placed at the beginning and end of the prompt generally receives more attention, and clear, structured instructions help the attention mechanism identify what is relevant.
Etymology
The term 'attention' is borrowed from the vocabulary of cognitive sciences, where it refers to the human brain's ability to selectively focus on certain information while ignoring others. In artificial intelligence, this metaphor was first formalized mathematically by Bahdanau et al. (2014) in the context of machine translation, before being generalized by Vaswani et al. (2017) in the Transformer architecture.
Concrete examples
Structuring a long prompt to maximize model attention
Here is a 3-page document. Your main task (IMPORTANT): extract only the Q3 2025 revenue figures. Ignore everything else. Document: [...]
Exploiting attention by placing key instructions at strategic positions
ABSOLUTE RULE: respond only in French.
[Prompt content...]
Reminder: your response must be entirely in French.
Understanding why a model loses track on very long contexts
Summarize the key points of each section separately, then provide an overall synthesis. This will help me verify that you haven't missed anything in the document.
Practical usage
In prompt engineering, the attention mechanism explains why you should place critical instructions at the beginning or end of the prompt, and why structural clarity (lists, headings, separators) improves results. When working with long contexts, break down your requests and repeat important instructions to compensate for the natural dilution of attention over long sequences.
Related concepts
FAQ
What is the difference between attention and self-attention?
Why is the attention mechanism so important for LLMs?
How can knowledge of attention be used to write better prompts?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Audio LLM: Definition and Examples
An Audio LLM is a large language model capable of processing, understanding, and generating audio content (speech, music, sounds) in addition to text, enabling
Automatic Prompt Engineer: Definition and Examples
Method for automatic prompt optimization where a language model itself generates, evaluates, and refines the instructions it is given, in order to maximize the quality of responses without manual human intervention.
Benchmark: Definition and Examples
A benchmark is a standardized test that evaluates and compares the performance of an AI model on specific tasks, such as language understanding, ...
Beneficial AI: Definition and Examples
Beneficial AI refers to artificial intelligence designed and deployed in a way that produces positive effects for humanity, minimizing risks and
Chain-of-Thought (CoT): Definition and Examples
Chain-of-Thought pushes AI to reason step by step. Discover how this technique improves complex responses.
Chain Of Thought Reasoning: Definition and Examples
Chain of Thought Reasoning is a prompting technique that involves asking an AI model to break down its reasoning into intermediate steps.
Get new prompts every week
Join our newsletter.