P

Precision Recall: Definition and Examples

Precision and recall are two complementary metrics used to evaluate the quality of a classification model or information retrieval system. Precision measures the proportion of relevant results among those returned, while recall measures the proportion of relevant results actually retrieved.

Full definition

Precision and recall are fundamental metrics in artificial intelligence and information retrieval. They allow evaluating the performance of a system that must identify relevant items among a dataset. These two measures are inseparable because they capture two different facets of result quality.

Precision answers the question: 'Among all the items the model identified as positive, how many are actually positive?' For example, if a spam filter classifies 100 emails as spam and 90 of them are indeed spam, the precision is 90%. A high-precision system produces few false positives. Recall answers the question: 'Among all the actually positive items, how many were correctly identified?' If the inbox contains 120 total spams and the filter detects 90, recall is 75%.

There is generally a trade-off between these two metrics, known as the precision-recall trade-off. Increasing precision tends to decrease recall, and vice versa. A very conservative system will have high precision but low recall (it only flags cases it is sure about), while a permissive system will have high recall but lower precision (it captures everything, including false positives). The F1-score, harmonic mean of precision and recall, helps find a balance between the two.

In the context of prompt engineering, understanding these metrics helps formulate more effective instructions. When asking an LLM to extract information or classify content, you can guide its responses toward more precision ('return only results you are certain of') or more recall ('list all possible items, even uncertain ones'). This understanding is essential for calibrating expectations and refining obtained results.

Etymology

The terms 'precision' and 'recall' come from the field of information retrieval, where they were formalized in the 1950s-1960s. The word 'precision' comes from Latin 'praecisio' (cutting off, exactness), while 'recall' comes from English 'to recall' (remember, retrieve). In French, the terms 'taux de précision' and 'taux de rappel' or 'sensibilité' are sometimes used for recall in the medical field.

Concrete examples

Named entity extraction in a document

Extract all companies mentioned in this text. Prioritize recall: list every possible mention, even if you are not 100% sure it is a company. Indicate your confidence level for each entry.

Support ticket classification

Classify this support ticket into one of the following categories: bug, feature request, question. Only classify the ticket if you are more than 90% confident — otherwise, answer 'uncertain'. I prefer precision over recall here.

Detection of inappropriate content in comments

Analyze these comments and flag those that contain offensive content. Better to flag a false positive than to let an offensive comment slip through — prioritize recall.

Practical usage

In prompt engineering, mastering the precision-recall trade-off allows calibrating an LLM's responses according to the use case. For critical tasks (medical diagnosis, fraud detection), prioritize recall to avoid missing anything. For tasks where false positives are costly (sending alerts, customer recommendations), prioritize precision by adding confidence thresholds in the prompt.

Related concepts

F1-ScoreConfusion matrixAccuracyROC curve

FAQ

What is the difference between precision and accuracy?
Accuracy measures the total percentage of correct predictions, both positive and negative combined. Precision, on the other hand, focuses only on positive predictions: among the items identified as positive, how many are actually positive? On imbalanced data (e.g., 95% negative cases), accuracy can be misleadingly high while precision reveals true performance on the class of interest.
How to choose between precision and recall in a prompt?
The choice depends on the cost of errors. If missing a relevant item is serious (e.g., disease detection), prioritize recall by asking the model to list all possible cases. If a false positive is costly (e.g., blocking a legitimate user), prioritize precision by asking the model to return only high-confidence results. You can explain this trade-off directly in your prompt.
What is the F1-Score and when to use it?
The F1-Score is the harmonic mean of precision and recall: F1 = 2 × (precision × recall) / (precision + recall). It is especially useful when you want a balance between the two metrics and the data is imbalanced. An F1-Score of 1 indicates perfect precision and recall, while a score close to 0 signals poor performance on at least one of the two metrics.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.