Loss Function: Definition and Examples

A loss function is a mathematical formula that measures the gap between an AI model's predictions and the expected results. It guides learning by quantifying the error to be minimized.

Full definition

The loss function is a fundamental pillar of machine learning. Its role is simple to understand: it assigns a numerical score to each prediction of the model, indicating how far that prediction deviates from reality. The higher this score, the more the model is wrong. The goal of training is therefore to minimize this value.

Concretely, during the training of a neural network, each batch of data passes through the model which produces a prediction. The loss function compares this prediction to the actual value (the label) and calculates an error. This error is then backpropagated through the network to adjust the weights of the neurons via an optimizer like SGD or Adam. This cycle repeats millions of times until the loss converges to a minimum.

There are many loss functions suited to different tasks. For classification, cross-entropy loss is typically used, which heavily penalizes confident but incorrect predictions. For regression, mean squared error (MSE) or mean absolute error (MAE) are common. More specialized tasks like text generation or object detection use custom loss functions.

The choice of loss function directly influences the behavior of the model. A poorly chosen function can lead to a model that optimizes the wrong objective: for example, using MSE for an imbalanced classification problem will produce a mediocre model. In prompt engineering, understanding the loss function helps to better anticipate the biases and behaviors of language models, because it is this function that shaped their responses during training.

Etymology

The term "loss" comes from statistical decision theory, where "loss" refers to the cost associated with a wrong decision. The concept was formalized by Abraham Wald in the 1940s in his work on statistical decision theory. In French, the terms "fonction de perte", "fonction de coût" (cost function), and "fonction objectif" (objective function) are used interchangeably, although these terms have technical nuances.

Concrete examples

Understanding why a model hallucinates

The GPT model sometimes generates false information with high confidence. Explain how the cross-entropy loss used during training can contribute to this hallucination phenomenon.

Choosing the right loss function for a project

I am building a bank fraud detection model where only 0.1% of transactions are fraudulent. Which loss function do you recommend and why? Compare standard cross-entropy, focal loss, and weighted cross-entropy for my case.

Explaining a technical concept simply

Explain the loss function to someone with no mathematical background. Use an analogy with a dart game and show how the concept applies to training ChatGPT.

Practical usage

In prompt engineering, understanding the loss function helps to formulate more effective instructions. For example, knowing that LLMs are trained with cross-entropy on next token prediction explains why they are naturally better at completion than abstract reasoning. This knowledge allows you to tailor your prompts to leverage the model's strengths rather than fight against its architectural limitations.

Related concepts

Gradient DescentBackpropagationOverfittingFine-tuning

FAQ

What is the difference between a loss function and an evaluation metric?

The loss function is used during training to guide the optimization of the model's weights — it must be mathematically differentiable. The evaluation metric (accuracy, F1-score, BLEU) is used after training to judge the model's performance in human-understandable terms. Sometimes they coincide, but often the business metric is not directly optimizable as a loss function.

Why do LLMs like ChatGPT use cross-entropy loss?

Large language models are trained to predict the next token in a text sequence. Cross-entropy loss is ideal for this task because it measures the divergence between the probability distribution predicted by the model over the entire vocabulary and the actual distribution (the correct token). It particularly penalizes confident but erroneous predictions, which pushes the model to calibrate its probabilities correctly.

How does RLHF modify the loss function of language models?

RLHF (Reinforcement Learning from Human Feedback) adds an additional layer to training. Instead of solely minimizing cross-entropy, the model is fine-tuned with a loss function based on human preferences. A reward model is first trained on human comparisons, then the LLM is optimized via PPO to maximize this reward while staying close to the base model. This process is what makes models more useful and aligned with user intentions.

How to use this prompt

Copy the prompt with the button above.
Paste it into ChatGPT, Claude or your favorite AI assistant.
Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

Prompt library Learn prompting Prompt builder Prompt optimizer

More definitions

Machine Translation: Definition and Examples

Machine Translation refers to the use of software and artificial intelligence algorithms to automatically translate a text from one language to another, preserving meaning. This glossary entry explores its definition, history, examples, and practical use in prompt engineering.

MCP Model Context Protocol: Definition and Examples

The Model Context Protocol (MCP) is an open standard that allows AI models to connect to external data sources, tools, and services.

Meta Learning: Definition and Examples

Meta learning, or "learning to learn," refers to the ability of an AI model or a user to improve learning strategies based on past experience.

Million Token Context: Definition and Examples

Capacity of a language model to process up to a million tokens in a single request, enabling analysis of very large documents, codebases

Model Card: Definition and Examples

A model card is a standardized document that accompanies an AI model to describe its performance, limitations, potential biases, and conditions of use

Model Distillation: Definition and Examples

Model distillation is a compression technique where a smaller model (the student) learns to replicate the behavior of a larger and more performant model (the teacher).

Get new prompts every week

Join our newsletter.