Transformer: Definition and Examples
Neural network architecture introduced in 2017 by Google, based on the attention mechanism, which forms the basis of all modern large language models like GPT, Claude, or Gemini.
Full definition
The Transformer is a deep neural network architecture introduced in the seminal paper "Attention Is All You Need" (Vaswani et al., 2017). Unlike recurrent architectures (RNN, LSTM) that processed sequences word by word, the Transformer processes an entire sequence in parallel thanks to a mechanism called "self-attention." This innovation enabled massive gains in training speed and ability to capture relationships between distant words in a text.
The core of the Transformer rests on three key components: embeddings (vector representations of tokens), the multi-head attention mechanism (which allows the model to "look" simultaneously at different parts of the input sequence), and feed-forward layers. The original architecture has an encoder (which understands the input) and a decoder (which generates the output), but many variants use only one of the two — GPT and Claude use only the decoder, while BERT uses only the encoder.
What makes the Transformer revolutionary is its ability to scale. By increasing the number of parameters, the size of training data, and computational power, performance improves predictably — this is known as "scaling laws." It is this property that has led to the race for ever larger models, from GPT-2 (1.5 billion parameters) to GPT-4 and Claude, which have hundreds of billions.
Today, the Transformer is no longer limited to text. This architecture has been successfully adapted to vision (Vision Transformer / ViT), audio, video, robotics, and even molecular biology (AlphaFold). It has become the universal foundation of modern generative artificial intelligence.
Etymology
The name "Transformer" comes from its ability to transform an input sequence into an output sequence via the attention mechanism. The term was introduced by the Google Brain and Google Research team in their 2017 paper, whose provocative title — "Attention Is All You Need" — emphasized that attention alone was sufficient, without recurrence or convolution.
Concrete examples
Understanding the internal workings of a model
Explain the attention mechanism in a Transformer to me as if I were a web developer with no machine learning background.
Comparing architectures for a technical choice
What are the differences between an encoder-only Transformer (like BERT), decoder-only (like GPT), and encoder-decoder (like T5)? For each type, give an ideal use case.
Explaining for an article or presentation
Write a simple analogy to explain how self-attention allows a Transformer to understand the context of a word in a sentence. Use an everyday metaphor.
Practical usage
Understanding the Transformer architecture helps with better prompting: knowing that the model processes tokens in parallel with an attention mechanism explains why the position and structure of your prompt matter. Placing important instructions at the beginning or end of the prompt, clearly structuring sections, and providing explicit context are practices directly linked to how attention distributes its "focus" on your text.
Related concepts
FAQ
What is the difference between a Transformer and an LLM?
Why did the Transformer replace RNNs and LSTMs?
Is it necessary to understand Transformers to prompt well?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Trustworthy AI: Definition and Examples
Trustworthy AI refers to artificial intelligence designed to be reliable, ethical, transparent, and respectful of fundamental rights.
Video Understanding: Definition and Examples
Ability of an AI model to analyze, interpret, and extract relevant information from video content, combining visual, temporal, and often audio understanding.
Vision RAG: Definition and Examples
Vision RAG is an extension of Retrieval-Augmented Generation that integrates visual documents (images, charts, scanned PDFs) into the search process.
World Model: Definition and Examples
A world model is an internal representation that an AI system builds of the external world, allowing it to simulate, predict, and reason about the consequences of its actions without having to execute them in reality.
Zero-Shot Prompting: Definition and Examples
Zero-shot prompting gives the AI an instruction without any examples. Discover when and how to use this technique.
A2A Agent To Agent: Definition and Examples
A2A (Agent-to-Agent) is an open protocol developed by Google that allows autonomous AI agents to communicate, collaborate, and delegate tasks between each other.
Get new prompts every week
Join our newsletter.