Encoder Decoder: Definition and Examples
Neural network architecture composed of two complementary modules: an encoder that compresses the input into an intermediate representation, and a decoder that generates the output from this representation.
Full definition
The Encoder-Decoder architecture is a fundamental paradigm in artificial intelligence, particularly in natural language processing and computer vision. It relies on a simple yet powerful principle: decomposing a complex task into two distinct steps. The encoder reads and analyzes the input (text, image, audio signal) to condense it into a dense vector representation, often called latent vector or context.
The decoder then takes this compressed representation and progressively generates the desired output, element by element. In machine translation, for instance, the encoder reads the source sentence in French and produces a vector capturing its meaning, then the decoder generates the English translation word by word from that vector.
This architecture underwent a major evolution with the introduction of the attention mechanism by Bahdanau et al. in 2014, and then the Transformer by Vaswani et al. in 2017. The Transformer replaced recurrent networks (RNN/LSTM) with self-attention mechanisms, enabling parallel processing and better long-distance dependency capture. Models like T5, BART, and mBART use this full encoder-decoder architecture.
It is important to distinguish this architecture from models that use only one of its components. Models like BERT use only the encoder (ideal for understanding), while GPT and Claude use only the decoder (ideal for generation). The full encoder-decoder architecture excels in transduction tasks, where one sequence must be transformed into another sequence of a different nature.
Etymology
The term combines 'encoder' (from Latin incodare, to put into code) and 'decoder' (to decode, extract information from code). In computer science, these terms historically refer to data compression and decompression processes. Their adoption in deep learning dates back to the work of Cho et al. (2014) and Sutskever et al. (2014) on neural sequence-to-sequence translation.
Concrete examples
Machine translation: transforming text from one language to another
Translate this technical text from French to English while preserving specialized terminology: [TEXT]. Ensure that meaning and nuances are preserved.
Text summarization: condensing a long document into a concise summary
Summarize this research article into 5 key points. Capture the main contributions, methodology, and results without losing essential information.
Code generation from a natural language description
Generate a Python function that takes a list of dictionaries and returns a pandas DataFrame sorted by date in descending order. Add error handling and docstrings.
Practical usage
In prompt engineering, understanding the encoder-decoder architecture helps choose the right model for each task. For transformation tasks (translation, summarization, paraphrasing), encoder-decoder models like T5 are often more effective. When using a decoder-only model like Claude, structure your prompts to explicitly provide the context that the encoder would normally capture.
Related concepts
FAQ
What is the difference between an encoder-only, decoder-only, and encoder-decoder model?
Why do the latest models like GPT-4 and Claude only use the decoder?
Is the encoder-decoder architecture still relevant today?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
European AI Act: Definition and Examples
The European AI Act is the world's first regulatory framework dedicated to artificial intelligence, adopted by the European Union to govern the development,
Existential AI Risk: Definition and Examples
Existential AI risk refers to the possibility that advanced artificial intelligence could cause human extinction or irreversible degradation
Federated Learning: Definition and Examples
Federated Learning is an AI model training technique where data remains on users' local devices,
Few-Shot Prompting: Definition and Examples
Few-shot prompting provides a few examples in your prompt to guide the AI. Master this fundamental technique.
Fine Tuning: Definition and Examples
Fine tuning is the process of adjusting a pre-trained AI model on a specific dataset to improve its performance for a particular task or
Frequency Penalty: Definition and Examples
The Frequency Penalty is a language model parameter that penalizes tokens proportionally to the number of times they appear in the generated text
Get new prompts every week
Join our newsletter.