Confusion Matrix: Definition and Examples
A confusion matrix is a table that summarizes the performance of a classification model by comparing the model's predictions to the actual values, detailing true positives, true negatives, false positives, and false negatives.
Full definition
The confusion matrix (or contingency table) is a fundamental tool in machine learning for evaluating the quality of a classification model. It is presented as a square table where each row represents instances of an actual class and each column represents instances of a predicted class (or vice versa depending on convention). For a binary classification, it contains four key values: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).
True positives are cases correctly identified as positive, while true negatives are cases correctly identified as negative. False positives (also called type I errors) are negative cases incorrectly classified as positive, and false negatives (type II errors) are positive cases incorrectly classified as negative. This breakdown allows you to understand not only how many errors the model makes, but more importantly what type of errors it makes.
From the confusion matrix, many essential metrics can be derived: precision (proportion of true positives among positive predictions), recall or sensitivity (proportion of true positives among actually positive cases), specificity, F1-score, and overall accuracy. Each of these metrics illuminates a different aspect of model performance.
In the context of prompt engineering, understanding the confusion matrix is crucial when working with LLMs for classification tasks (sentiment analysis, spam detection, text categorization). It helps identify if the model tends to over-classify or under-classify certain categories, and to adjust prompts accordingly to reduce a specific type of error.
Etymology
The term 'confusion matrix' gets its name from the fact that it allows you to see if a classification model 'confuses' certain classes with each other. The word 'matrix' refers to its tabular mathematical structure. The concept was introduced in the 1950s-1960s in the field of experimental psychology and signal detection theory, before being widely adopted in statistics and artificial intelligence.
Concrete examples
Evaluating a sentiment classifier
Here are the classification results for 100 customer reviews. Build a confusion matrix and calculate precision, recall, and F1-score for each class (positive, negative, neutral). Identify which class is most often confused with another.
Optimizing a spam detection prompt
You are a spam detector. Classify each email as 'spam' or 'legitimate'. Prioritize minimizing false positives (legitimate emails classified as spam) over false negatives, because a missed important email is more serious than an unfiltered spam.
AI-assisted medical diagnosis
Analyze these screening results and generate the corresponding confusion matrix. Calculate the sensitivity and specificity of the test. Explain why, in a screening context, high recall is preferable to high precision.
Practical usage
In prompt engineering, the confusion matrix helps you evaluate and improve your classification prompts. After testing a prompt on a labeled dataset, build the matrix to identify systematic model confusions. Then adjust your prompt by adding specific instructions for ambiguous cases, or by providing few-shot examples targeting the most frequent errors.
Related concepts
FAQ
What is the difference between the confusion matrix and accuracy?
How do you read a confusion matrix for a multiclass problem?
When should precision be favored over recall?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Constitutional AI: Definition and Examples
AI alignment method developed by Anthropic, where a model is trained to self-correct by following a set of written principles (a 'constitution')
Context Window: Definition and Examples
The context window refers to the maximum amount of text a language model can process at one time, encompassing both the user input and the generated response.
Continual Learning: Definition and Examples
Continual Learning refers to the ability of an AI model to learn new tasks or data sequentially, without forgetting previously acquired knowledge.
Cross Attention: Definition and Examples
Attention mechanism that allows a model to relate two different sequences, such as an image and a text, so that each element of one sequence can attend to elements of the other.
Cursor: Definition and Overview of the AI Editor
Understand Cursor: AI-native code editor based on VS Code. Differences with Claude Code, GitHub Copilot, and Windsurf, concrete use cases.
Custom GPT: Definition and How to Create Your Own
Understand OpenAI's Custom GPTs: pre-configured ChatGPT assistants. Step-by-step creation, differences with Claude Skills and Gemini Gems.
Get new prompts every week
Join our newsletter.