P

Accuracy: Definition and Examples

Accuracy (or exactness) measures the proportion of correct answers produced by an AI model compared to all generated responses. It is one of the fundamental metrics for evaluating the reliability of an artificial intelligence system.

Full definition

Accuracy (or exactness) is an evaluation metric that quantifies the ability of an artificial intelligence model to produce correct results. It is calculated by dividing the number of correct predictions by the total number of predictions made. A model with 95% accuracy means it gives the correct answer 95 times out of 100.

In the context of large language models (LLMs) like GPT-4 or Claude, accuracy takes on a more nuanced dimension. Unlike a binary classifier where the answer is either right or wrong, an LLM generates free text whose correctness can be partial, contextual, or subjective. We then speak of factual accuracy (are the stated facts verifiable?), semantic accuracy (does the meaning of the response match the question?), or logical accuracy (is the reasoning coherent?).

In prompt engineering, accuracy is directly influenced by the quality of instructions given to the model. A vague or ambiguous prompt will produce less precise responses, while a structured prompt with clear constraints, examples, and a defined output format will significantly improve the accuracy of results. Techniques like Chain-of-Thought, few-shot prompting, or cross-checking allow measurable increases in accuracy.

It is important to note that accuracy alone is not always sufficient to evaluate a model. On imbalanced datasets, a model can display high accuracy while consistently failing on minority cases. That is why it is often complemented by other metrics like precision, recall, or F1-score to obtain a more complete view of performance.

Etymology

The term 'accuracy' comes from the Latin 'accuratus', the past participle of 'accurare' meaning 'to do with care'. In English, it became established in scientific vocabulary to denote the exactness of a measurement. In artificial intelligence, it was adopted as-is as a standard metric from the early work in machine learning in the 1950s-1960s.

Concrete examples

Image classification: evaluate whether a model correctly identifies photos of cats and dogs

Analyze this image and identify the animal present. Reply only with 'cat' or 'dog'. Justify your choice in one sentence.

Factual verification: ensure that an LLM does not generate hallucinations on historical data

Answer the following question based solely on verifiable facts. If you are unsure of any information, indicate it explicitly rather than inventing. Question: In which year was the Eiffel Tower built?

Structured data extraction: measuring the model's ability to correctly extract information from text

Extract the following information from this resume in JSON format: name, email, years of experience, main skills. If any information is missing, use null. Do not infer anything that is not explicitly mentioned.

Practical usage

To improve the accuracy of your prompts, be explicit about the expected output format and provide concrete examples of correct responses (few-shot prompting). Use verification instructions like 'Check your answer before giving it' or 'If you're not sure, say so' to reduce errors. Finally, break down complex tasks into successive steps (Chain-of-Thought) so that the model reasons more rigorously.

Related concepts

Précision (Precision)Rappel (Recall)F1-ScoreHallucination

FAQ

What is the difference between accuracy and precision in AI?
Accuracy measures the overall rate of correct answers among all predictions, while precision measures the proportion of true positives among items identified as positive. For example, if a model detects spam, accuracy indicates its overall success rate, whereas precision indicates how many emails marked as spam were actually spam.
Can we measure the accuracy of a large language model (LLM)?
Yes, but it is more complex than for a traditional classifier. Standardized benchmarks (MMLU, HumanEval, GSM8K) are used that ask questions with verifiable answers. For open-ended tasks like writing, human evaluations or judge models (LLM-as-a-Judge) are used to rate the quality and accuracy of responses.
How can prompt engineering improve a model's accuracy?
Prompt engineering improves accuracy by reducing ambiguity of instructions, providing relevant context, and guiding the model's reasoning. Techniques like few-shot prompting (giving examples), Chain-of-Thought (asking the model to reason step by step), or adding explicit constraints ("answer only with verified facts") can significantly increase the rate of correct answers.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.