P

Scaling Laws: Definition and Examples

Scaling laws are mathematical relationships that describe how AI model performance improves predictably as model size, training data, or compute increases.

Full definition

Scaling laws are among the most fundamental discoveries in modern AI research. Formalized notably by OpenAI researchers (Kaplan et al., 2020) and later refined by the DeepMind team (Hoffmann et al., 2022 with the Chinchilla work), these laws establish that language model performance follows predictable power-law curves as a function of three key variables: the number of model parameters, the volume of training data, and the compute budget.

Concretely, these laws show that a model's loss decreases according to a power-law function when any one of these three variables is increased while the other two remain constant. For example, doubling the number of parameters does not double performance, but improves it by a predictable and regular factor. This relationship follows a power law of the form L = a × N^(-α), where L is loss, N is the scaling variable, and α is a characteristic exponent.

The practical impact of scaling laws is immense. They allow research labs to predict model performance before training, simply by extrapolating from smaller models. It is thanks to these laws that investment decisions of several hundred million dollars in compute infrastructure can be made with relative confidence. The Chinchilla work notably showed that many models were 'undertrained' relative to their size, and that a better balance between parameters and data produced superior results at lower cost.

More recently, the community distinguishes 'pre-training scaling laws' from 'inference-time scaling laws' (or test-time compute). The latter, popularized by models like o1 and o3 from OpenAI, show that allocating more computation at generation time—through extended reasoning techniques—can also predictably improve performance, opening a new dimension of optimization.

Etymology

The term 'scaling laws' comes from physics and mathematics, where scaling laws describe how a phenomenon changes when the scale of observation is altered. In statistical physics, these laws characterize phase transitions and critical phenomena. The term was adopted by the AI community from 2020 onward to describe the empirical regularities observed in training large language models.

Concrete examples

Planning a model training

Based on the Chinchilla scaling laws, help me calculate the optimal ratio between number of parameters and training tokens for a compute budget of 10^24 FLOPs.

Comparing models of different sizes

Explain why GPT-4 outperforms GPT-3 in terms of scaling laws. Which scaling factors were increased and in what proportions?

Strategic choice between model size and inference time

For a complex mathematical reasoning task, is it better to use a larger model or increase inference compute with chain-of-thought techniques? Analyze in terms of scaling laws.

Practical usage

In prompt engineering, understanding scaling laws helps choose the right model for each task: a larger model is not always necessary for simple tasks, but complex tasks benefit significantly from a larger-scale model. This also helps understand why extended reasoning techniques (chain-of-thought, reflection) improve results—they exploit inference-time scaling laws. Finally, knowing these laws allows one to anticipate future model capabilities and adapt prompting strategies accordingly.

Related concepts

Model ParametersTraining TokensCompute (FLOPs)Inference-Time ComputeLoss FunctionChinchilla Optimal

FAQ

Do scaling laws mean a larger model is always better?
Not necessarily. Scaling laws show that performance improves with size, but with diminishing returns. Moreover, the Chinchilla work proved that a smaller but better-trained model (with more data) can outperform a larger undertrained one. The optimal ratio between size and data is crucial.
Do scaling laws have limitations?
Yes, several limitations have been identified. First, they describe average trends and do not predict specific emergent abilities (e.g., multi-step reasoning). Second, they assume near-unlimited access to quality data, which is becoming a real bottleneck. Finally, some researchers believe gains may slow as models approach the limits of available human data.
What is the link between scaling laws and the cost of AI models?
Scaling laws are directly linked to costs because they dictate the compute budget needed to achieve a given performance level. Training a frontier model today costs several hundred million dollars. Scaling laws allow labs to optimize budget allocation by finding the optimal balance between model size, data volume, and training duration.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.