P

Supervised Learning: Definition and Examples

Supervised learning is an artificial intelligence method where a model learns from labeled data, i.e., examples whose correct output is already known, in order to predict outcomes on new data.

Full definition

Supervised learning is one of the fundamental approaches of machine learning. Its principle is simple: we provide the model with a training set consisting of input-output pairs, where each input is associated with the expected answer (called a label). The model progressively learns to identify patterns that link inputs to outputs.

Concretely, imagine a teacher grading a student's tests. At each exercise, the student proposes an answer and the teacher indicates whether it is correct or not. Over time, the student learns the underlying rules and becomes able to solve new similar exercises. That is exactly how supervised learning works: the model adjusts its internal parameters to minimize the gap between its predictions and the correct answers.

There are two main categories of tasks in supervised learning. Classification consists of assigning a category to a data point (e.g., determining whether an email is spam or not). Regression aims to predict a continuous numerical value (e.g., estimating the price of a property based on its features).

Supervised learning is ubiquitous today: image recognition, machine translation, medical diagnosis, bank fraud detection, content recommendation. Its main constraint is that it requires a sufficient amount of labeled data, which can represent a significant cost in time and human resources for annotation.

Etymology

The term "supervised" comes from the analogy with a teacher who supervises a student's learning. Introduced in the 1950s-1960s with early work on Frank Rosenblatt's perceptron, the concept contrasts with "unsupervised learning" where no labels are provided. The word reflects the presence of an implicit "supervisor": the labeled data that guides the model toward correct answers.

Concrete examples

Image classification for an e-commerce project

I am developing an image classifier for clothing with a labeled dataset (t-shirt, pants, dress, etc.). Explain how to structure my supervised learning pipeline with PyTorch, including data augmentation and cross-validation.

Sentiment analysis on customer reviews

I have 10,000 customer reviews labeled as positive, negative, or neutral. Propose a supervised learning approach to train a sentiment classification model, comparing the performance of a classic model (SVM) and a transformer-based model.

Real estate value prediction (regression)

From a dataset containing the area, number of rooms, location, and sale price of homes, guide me in building a supervised regression model that predicts property price. Include feature engineering and model evaluation.

Practical usage

In prompt engineering, understanding supervised learning allows for better formulation of requests involving model training or analysis of labeled data. When asking an LLM to help design a machine learning pipeline, always specify the nature of your labels, the size of your dataset, and the desired evaluation metric. This allows the model to suggest the architecture and hyperparameters best suited to your use case.

Related concepts

Unsupervised LearningClassificationRegressionOverfittingTraining DataNeural Network

FAQ

What is the difference between supervised and unsupervised learning?
Supervised learning uses labeled data (each example has a known answer) to learn to predict. Unsupervised learning works on data without labels and seeks to discover hidden structures, such as groups or anomalies. For example, classifying emails as spam/not spam is supervised, while clustering customers by purchasing behavior without predefined categories is unsupervised.
How much labeled data is needed for supervised learning?
There is no universal number: it depends on the complexity of the problem, the number of categories, and the model used. As a rule of thumb, a few hundred examples per class suffice for simple tasks with classic models. Deep neural networks often require thousands or even millions of examples. Techniques like transfer learning or few-shot learning can significantly reduce this need.
Do large language models like GPT or Claude use supervised learning?
Partially. LLMs are first pre-trained in a self-supervised manner on massive text corpora (they predict the next word). They are then refined through supervised learning (Supervised Fine-Tuning or SFT) with quality conversation examples, followed by RLHF (Reinforcement Learning from Human Feedback). Supervised learning thus plays a crucial role in the alignment and quality of these models' responses.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.