P

Encoder Decoder: Definition and Examples

Neural network architecture composed of two complementary modules: an encoder that compresses the input into an intermediate representation, and a decoder that generates the output from this representation.

Full definition

The Encoder-Decoder architecture is a fundamental paradigm in artificial intelligence, particularly in natural language processing and computer vision. It relies on a simple yet powerful principle: decomposing a complex task into two distinct steps. The encoder reads and analyzes the input (text, image, audio signal) to condense it into a dense vector representation, often called latent vector or context.

The decoder then takes this compressed representation and progressively generates the desired output, element by element. In machine translation, for instance, the encoder reads the source sentence in French and produces a vector capturing its meaning, then the decoder generates the English translation word by word from that vector.

This architecture underwent a major evolution with the introduction of the attention mechanism by Bahdanau et al. in 2014, and then the Transformer by Vaswani et al. in 2017. The Transformer replaced recurrent networks (RNN/LSTM) with self-attention mechanisms, enabling parallel processing and better long-distance dependency capture. Models like T5, BART, and mBART use this full encoder-decoder architecture.

It is important to distinguish this architecture from models that use only one of its components. Models like BERT use only the encoder (ideal for understanding), while GPT and Claude use only the decoder (ideal for generation). The full encoder-decoder architecture excels in transduction tasks, where one sequence must be transformed into another sequence of a different nature.

Etymology

The term combines 'encoder' (from Latin incodare, to put into code) and 'decoder' (to decode, extract information from code). In computer science, these terms historically refer to data compression and decompression processes. Their adoption in deep learning dates back to the work of Cho et al. (2014) and Sutskever et al. (2014) on neural sequence-to-sequence translation.

Concrete examples

Machine translation: transforming text from one language to another

Translate this technical text from French to English while preserving specialized terminology: [TEXT]. Ensure that meaning and nuances are preserved.

Text summarization: condensing a long document into a concise summary

Summarize this research article into 5 key points. Capture the main contributions, methodology, and results without losing essential information.

Code generation from a natural language description

Generate a Python function that takes a list of dictionaries and returns a pandas DataFrame sorted by date in descending order. Add error handling and docstrings.

Practical usage

In prompt engineering, understanding the encoder-decoder architecture helps choose the right model for each task. For transformation tasks (translation, summarization, paraphrasing), encoder-decoder models like T5 are often more effective. When using a decoder-only model like Claude, structure your prompts to explicitly provide the context that the encoder would normally capture.

Related concepts

TransformerAttention (mechanism)Sequence-to-SequenceLatent vector

FAQ

What is the difference between an encoder-only, decoder-only, and encoder-decoder model?
An encoder-only model (like BERT) excels at understanding: classification, sentiment analysis, semantic search. A decoder-only model (like GPT or Claude) is optimized for text generation. An encoder-decoder model (like T5 or BART) combines both and excels at transformation tasks where input and output are of different natures, such as translation or summarization.
Why do the latest models like GPT-4 and Claude only use the decoder?
Decoder-only models have proven remarkably versatile thanks to scaling (increased size and data). By formulating any task as text generation conditioned on a prompt, they manage to match or even surpass encoder-decoder models on most tasks, while being simpler to train and deploy. The prompt then acts as an implicit encoder.
Is the encoder-decoder architecture still relevant today?
Yes, it remains very relevant in several fields. In specialized machine translation, speech recognition (OpenAI's Whisper uses an encoder-decoder), computer vision, and industrial applications requiring compact and efficient models. It is also preferred when the task involves a structured transformation between two well-defined formats.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.