P

Unsupervised Learning: Definition and Examples

Unsupervised learning is a branch of machine learning where a model analyzes data without prior labels to discover structures, patterns, or groupings within it.

Full definition

Unsupervised learning is a machine learning method in which an algorithm is trained on a dataset without labels or expected outputs. Unlike supervised learning, where each example is associated with a known output, unsupervised learning lets the model explore the data on its own to extract underlying structures.

The most common techniques include clustering (automatic grouping of similar data, such as K-means or DBSCAN), dimensionality reduction (such as PCA or t-SNE, which simplify complex data while preserving essential features), and anomaly detection. These methods are particularly useful when dealing with large amounts of raw data without human annotations.

In the context of large language models (LLMs), unsupervised learning plays a fundamental role. The pre-training phase of models like GPT or Claude largely relies on unsupervised principles: the model learns to predict the next word in vast text corpora, without being explicitly given the 'correct answers'. It is this ability to learn rich language representations autonomously that makes these models so versatile.

For prompt engineering practitioners, understanding unsupervised learning helps to better grasp how an LLM acquired its knowledge and why it can sometimes generalize surprisingly or, conversely, produce unexpected results. This understanding aids in formulating prompts that best leverage the patterns the model internalized during training.

Etymology

The term comes from English 'unsupervised', meaning 'without supervision'. It contrasts with 'supervised learning' where a 'supervisor'—in the form of human labels—guides the learning. The metaphor evokes a student learning through autonomous observation rather than directed instruction.

Concrete examples

Customer segmentation in marketing

I have a dataset of 10,000 customers with their purchasing behaviors. Suggest an unsupervised learning approach to identify distinct customer segments, detailing the recommended algorithm and features to use.

Anomaly detection in server logs

Act as a data scientist specialized in cybersecurity. Explain how to use unsupervised learning to detect anomalous behaviors in connection logs, without prior examples of attacks.

Text data exploration

I have 5,000 uncategorized customer reviews. How can I apply topic modeling (an unsupervised learning technique) to automatically discover recurring themes? Give me a step-by-step pipeline.

Practical usage

In prompt engineering, knowledge of unsupervised learning allows a better understanding of the strengths and limitations of LLMs. When a model spontaneously groups concepts or identifies analogies without explicit instruction, it relies on representations learned in an unsupervised manner. Exploit this by crafting prompts that ask the model to categorize, group, or identify patterns in unstructured data.

Related concepts

Supervised LearningClusteringDimensionality ReductionSelf-Supervised Learning

FAQ

What is the difference between supervised and unsupervised learning?
Supervised learning uses labeled data (with known answers) to train a model to predict outcomes. Unsupervised learning works with raw, unlabeled data and seeks to discover hidden structures like groups or patterns. For example, classifying emails as spam/not-spam is supervised, while grouping customers by similar behavior without predefined categories is unsupervised.
Do LLMs like Claude use unsupervised learning?
Yes, partially. The pre-training phase of LLMs is essentially self-supervised (a form of unsupervised learning): the model learns to predict tokens from vast text corpora without human annotations. However, modern LLMs then go through supervised learning phases (fine-tuning with instructions) and RLHF (reinforcement learning from human feedback) to refine their capabilities.
When to use unsupervised learning instead of supervised?
Unsupervised learning is ideal when you don't have labeled data, when labeling would be too costly, or when you want to explore your data without prior assumptions. It is particularly relevant for market segmentation, anomaly detection, content recommendation, and exploratory analysis of large datasets.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.