Scaling Laws: Definition and Examples
Scaling laws are mathematical relationships that describe how AI model performance improves predictably as model size, training data, or compute increases.
Full definition
Scaling laws are among the most fundamental discoveries in modern AI research. Formalized notably by OpenAI researchers (Kaplan et al., 2020) and later refined by the DeepMind team (Hoffmann et al., 2022 with the Chinchilla work), these laws establish that language model performance follows predictable power-law curves as a function of three key variables: the number of model parameters, the volume of training data, and the compute budget.
Concretely, these laws show that a model's loss decreases according to a power-law function when any one of these three variables is increased while the other two remain constant. For example, doubling the number of parameters does not double performance, but improves it by a predictable and regular factor. This relationship follows a power law of the form L = a × N^(-α), where L is loss, N is the scaling variable, and α is a characteristic exponent.
The practical impact of scaling laws is immense. They allow research labs to predict model performance before training, simply by extrapolating from smaller models. It is thanks to these laws that investment decisions of several hundred million dollars in compute infrastructure can be made with relative confidence. The Chinchilla work notably showed that many models were 'undertrained' relative to their size, and that a better balance between parameters and data produced superior results at lower cost.
More recently, the community distinguishes 'pre-training scaling laws' from 'inference-time scaling laws' (or test-time compute). The latter, popularized by models like o1 and o3 from OpenAI, show that allocating more computation at generation time—through extended reasoning techniques—can also predictably improve performance, opening a new dimension of optimization.
Etymology
The term 'scaling laws' comes from physics and mathematics, where scaling laws describe how a phenomenon changes when the scale of observation is altered. In statistical physics, these laws characterize phase transitions and critical phenomena. The term was adopted by the AI community from 2020 onward to describe the empirical regularities observed in training large language models.
Concrete examples
Planning a model training
Based on the Chinchilla scaling laws, help me calculate the optimal ratio between number of parameters and training tokens for a compute budget of 10^24 FLOPs.
Comparing models of different sizes
Explain why GPT-4 outperforms GPT-3 in terms of scaling laws. Which scaling factors were increased and in what proportions?
Strategic choice between model size and inference time
For a complex mathematical reasoning task, is it better to use a larger model or increase inference compute with chain-of-thought techniques? Analyze in terms of scaling laws.
Practical usage
In prompt engineering, understanding scaling laws helps choose the right model for each task: a larger model is not always necessary for simple tasks, but complex tasks benefit significantly from a larger-scale model. This also helps understand why extended reasoning techniques (chain-of-thought, reflection) improve results—they exploit inference-time scaling laws. Finally, knowing these laws allows one to anticipate future model capabilities and adapt prompting strategies accordingly.
Related concepts
FAQ
Do scaling laws mean a larger model is always better?
Do scaling laws have limitations?
What is the link between scaling laws and the cost of AI models?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Semantic Cache: Definition and Examples
A semantic cache is a caching system that stores and retrieves AI model responses based on the semantic similarity of queries, rather than exact word matches.
Synthetic Media: Definition and Examples
Synthetic media refers to any content — text, image, audio, or video — generated or manipulated by artificial intelligence algorithms, particularly through
Test Time Compute: Definition and Examples
Test Time Compute refers to the computing power used by an AI model during inference (response generation), as opposed to the resources consumed during training.
Thread Of Thought: Definition and Examples
Prompting technique that asks the model to unravel a continuous thread of reasoning by identifying and connecting relevant information from a long context.
Trustworthy AI: Definition and Examples
Trustworthy AI refers to artificial intelligence designed to be reliable, ethical, transparent, and respectful of fundamental rights.
Video Understanding: Definition and Examples
Ability of an AI model to analyze, interpret, and extract relevant information from video content, combining visual, temporal, and often audio understanding.
Get new prompts every week
Join our newsletter.