Dropout: Definition and Examples
Dropout is a regularization technique used during neural network training that randomly deactivates a fraction of neurons at each iteration to prevent overfitting.
Full definition
Dropout is one of the most influential regularization techniques in deep learning, introduced by Geoffrey Hinton and his team in 2012. Its principle is elegantly simple: during each training step, each neuron in the network has a probability p (typically 0.5 for hidden layers and 0.2 for the input layer) of being temporarily 'turned off', meaning its output is set to zero. This forces the network not to rely excessively on any single neuron or small group of neurons.
The intuition behind dropout is that it simulates training an ensemble of different sub-networks at each iteration. Since each neuron can be deactivated at any time, the network learns more robust and distributed representations. Dropout can also be seen as a form of 'structural noise' that prevents the model from memorizing training data instead of extracting generalizable patterns.
In practice, dropout is only applied during the training phase. During inference (when the model makes predictions), all neurons are active, but their weights are multiplied by (1 - p) to compensate for the fact that more neurons are active than during training. This technique, called 'inverted dropout' in its modern variant, performs this compensation directly during training.
Although dropout was initially designed for fully connected neural networks, variants exist for other architectures: spatial dropout for convolutional networks (CNNs), recurrent dropout for recurrent networks (RNNs/LSTMs), and DropConnect which deactivates connections rather than neurons. In modern Transformer architectures like GPT or BERT, dropout is still used on attention layers and feed-forward layers.
Etymology
The term 'dropout' comes from English and literally means 'abandonment' or 'dropping out'. In the context of neural networks, it refers to the fact that some neurons temporarily 'drop out' of the network during training, as if they were absent. The term was popularized by the foundational paper by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov published in 2014 in the Journal of Machine Learning Research.
Concrete examples
Understanding a language model's architecture
Explain the architecture of a Transformer, detailing the role of dropout in attention layers and feed-forward layers. What dropout rate is typically used in GPT and BERT?
Diagnosing overfitting during model training
My image classification model achieves 99% accuracy on training data but only 72% on the test set. Suggest a regularization strategy including dropout, specifying rates to test and which layers to apply it to.
Comparing regularization techniques for an NLP project
Compare the advantages and disadvantages of dropout, weight decay, and data augmentation for a French text classification model. In what order should I implement them?
Practical usage
In prompt engineering, understanding dropout helps to better interpret the stochastic behavior of language models and to formulate more precise queries about network architecture. When discussing fine-tuning or model training with an AI, mentioning the desired dropout rate allows obtaining configurations more suited to your use case. It is also a key concept for effectively communicating with data scientists or understanding technical model documentation.
Related concepts
FAQ
Why is dropout not applied during inference?
What dropout rate should I choose for my model?
Is dropout still used in modern models like GPT-4 or Claude?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Embedding: Definition and Examples
An embedding is a numerical representation of text, image, or other data type as a vector of numbers, enabling AI models to measure semantic similarity between items.
Encoder Decoder: Definition and Examples
Neural network architecture composed of two complementary modules: an encoder that compresses the input into an intermediate representation, and a decoder that generates the output from this representation.
European AI Act: Definition and Examples
The European AI Act is the world's first regulatory framework dedicated to artificial intelligence, adopted by the European Union to govern the development,
Existential AI Risk: Definition and Examples
Existential AI risk refers to the possibility that advanced artificial intelligence could cause human extinction or irreversible degradation
Federated Learning: Definition and Examples
Federated Learning is an AI model training technique where data remains on users' local devices,
Few-Shot Prompting: Definition and Examples
Few-shot prompting provides a few examples in your prompt to guide the AI. Master this fundamental technique.
Get new prompts every week
Join our newsletter.