Gradient Descent: Definition and Examples
Gradient Descent is an iterative optimization algorithm used to minimize a cost function by gradually adjusting the parameters of a model in the direction opposite to the gradient.
Full definition
Gradient Descent is the fundamental algorithm that allows artificial intelligence models to learn. Its principle is remarkably intuitive: imagine you are lost in a thick fog at the top of a mountain and you are trying to descend into the valley. At each step, you feel the ground around you and move in the direction where the slope descends most steeply. This is exactly what Gradient Descent does with a model's parameters.
Specifically, the algorithm computes the gradient (the partial derivative) of the cost function with respect to each parameter of the model. This gradient indicates the direction in which the error increases most rapidly. By moving in the opposite direction, the error is gradually reduced. The learning rate controls the size of each step: too large, you risk overshooting the minimum; too small, training will be extremely slow.
There are several variants of this algorithm. Batch Gradient Descent uses the entire dataset to compute each update, which is precise but computationally expensive. Stochastic Gradient Descent (SGD) uses only one example at a time, making it faster but noisier. Mini-batch Gradient Descent, the most commonly used in practice, strikes a balance by using small batches of data. Modern optimizers like Adam, RMSProp, or AdaGrad add adaptive mechanisms to automatically adjust the learning rate.
Gradient Descent is at the heart of training all neural networks, including large language models (LLMs) like GPT or Claude. Without this algorithm, it would be impossible to adjust the billions of parameters that enable these models to understand and generate text. Understanding how it works helps to better grasp why some models converge with difficulty, why fine-tuning works, and how hyperparameters influence the final quality of a model.
Etymology
The term comes from Latin 'gradiens' (walking, progressing) and 'descensus' (descent). The mathematical concept of gradient refers to the vector of partial derivatives of a function, indicating the direction of steepest variation. The combination of the two words literally describes the action of 'descending by following the slope.' The algorithm was formalized by Augustin-Louis Cauchy in 1847, long before the era of artificial intelligence.
Concrete examples
Understanding why a model is not converging
My image classification model stops improving after a few epochs. The learning rate is 0.1. Can you explain how gradient descent might be stuck and what adjustments to try?
Choosing the right optimizer for a project
I need to train a small neural network for spam detection. What are the practical differences between SGD, Adam, and RMSProp for my use case? Which one do you recommend and why?
Simplifying the concept for a presentation
Explain gradient descent to a non-technical audience using an everyday analogy. I am preparing a presentation for decision-makers who want to understand how AI learns.
Practical usage
In prompt engineering, understanding Gradient Descent allows you to better formulate questions about training and fine-tuning models. You can ask an LLM to explain why training diverges, to recommend suitable hyperparameters, or to diagnose convergence problems. This knowledge is also essential for writing precise technical prompts when working on machine learning projects.
Related concepts
FAQ
What is the difference between Gradient Descent and Stochastic Gradient Descent?
Why is the learning rate so important in Gradient Descent?
Is Gradient Descent used to train large language models like ChatGPT or Claude?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Grounding: Definition and Examples
Grounding (anchoring) is a technique that involves providing the AI model with factual data, documents, or concrete context so that its responses
Grouped Query Attention: Definition and Examples
Attention mechanism that groups multiple query heads to share the same keys and values, thereby reducing memory and computational cost during inference.
Hallucination: Definition and Examples
Why do ChatGPT and Claude sometimes make up information? Understand AI hallucinations, their causes, and 5 practical methods to avoid them.
Human In The Loop: Definition and Examples
Approach where a human actively intervenes in the decision-making process of an artificial intelligence system, supervising, validating, or correcting its outputs before they are applied.
Human On The Loop: Definition and Examples
A supervision approach where a human monitors and can intervene in the actions of an autonomous AI system, without validating each decision individually.
Image To Text: Definition and Examples
Image To Text (or image-to-text recognition) refers to the set of artificial intelligence techniques that extract, interpret, or generate textual content from an image.
Get new prompts every week
Join our newsletter.