Presence Penalty: Definition and Examples

The Presence Penalty is a language model parameter that penalizes tokens that have already appeared in the generated text, encouraging the model to introduce new topics and terms rather than repeating itself.

Full definition

The Presence Penalty is a hyperparameter available in most language model APIs such as those from OpenAI or Cohere. Its role is simple: each time a token has already been generated in the response, a fixed penalty is applied to its probability of being selected again. Unlike Frequency Penalty, which increases proportionally with the number of occurrences, Presence Penalty applies in a binary manner — the token has either already appeared (penalized) or never appeared (not penalized).

Concretely, this parameter generally accepts a value between -2.0 and 2.0. A positive value discourages repetition and pushes the model to explore new words and concepts. A negative value, conversely, encourages the model to reuse already mentioned terms, which can reinforce thematic coherence. A value of 0 completely disables the effect.

The main benefit of Presence Penalty lies in its ability to control the lexical and thematic diversity of responses. When increased, the model tends to address more different topics and vary its vocabulary. This is particularly useful for brainstorming, creative generation, or any task requiring broad coverage of concepts.

It is important to distinguish Presence Penalty from Frequency Penalty and Temperature, which all influence output diversity but through different mechanisms. A good prompt engineer knows how to combine these parameters in a balanced way to obtain the desired model behavior, without sacrificing coherence for diversity, or vice versa.

Etymology

The term comes from the English words 'presence' and 'penalty'. It literally describes a penalty applied based on the mere presence of a token in the already generated text. This concept was popularized by OpenAI when introducing its generation control parameters in the GPT-3 API in 2020.

Concrete examples

Creative brainstorming — force the model to explore varied ideas without looping

Give me 20 name ideas for an AI startup. Be as varied as possible. [PRESENCE_PENALTY: 1.5]

Article writing — maintain a rich vocabulary without excessive repetition

Write a 500-word article on renewable energy. [PRESENCE_PENALTY: 0.6]

Code generation — keep a low value so the model reuses variable names consistently

Write a Python function that sorts a list and handles errors. [PRESENCE_PENALTY: 0.0]

Practical usage

In practice, use a Presence Penalty between 0.3 and 0.8 for most writing tasks to get natural and varied text. Increase to 1.0-1.5 for brainstorming or creative exploration. Keep the value at 0 for technical tasks like code generation or factual responses where terminological consistency is essential.

Related concepts

Frequency PenaltyTemperatureTop-P (Nucleus Sampling)Repetition in LLMs

FAQ

What is the difference between Presence Penalty and Frequency Penalty?

Presence Penalty applies a fixed penalty as soon as a token has appeared at least once, regardless of the number of repetitions. Frequency Penalty increases proportionally: the more a token is repeated, the more it is penalized. Presence Penalty promotes thematic diversity, while Frequency Penalty specifically targets excessive repetitions.

What default Presence Penalty value should I use?

For general use, a value between 0 and 0.6 is recommended. The default value in most APIs is 0 (disabled). Start with 0.3 if you notice repetitions, then adjust gradually. Avoid values above 1.5 except for very specific cases, as the text may become incoherent.

Can Presence Penalty and Temperature be combined?

Yes, and it's even common. Temperature controls the overall randomness of token selection, while Presence Penalty specifically targets reusing existing tokens. For creative and varied text, combine a Temperature of 0.8-1.0 with a Presence Penalty of 0.5-1.0. However, be careful not to stack too high values on both parameters, which would produce disjointed text.

How to use this prompt

Copy the prompt with the button above.
Paste it into ChatGPT, Claude or your favorite AI assistant.
Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

Prompt library Learn prompting Prompt builder Prompt optimizer

More definitions

Program of Thought: Definition and Examples

Prompting technique where the model generates executable code to solve a reasoning problem, instead of producing a natural language chain of thought.

Prompt Chaining: Definition and Examples

Prompt chaining is a technique that involves chaining multiple sequential prompts, where the output of each step feeds the input of the next, to

Prompt Compression: Definition and Examples

Technique for reducing the length of a prompt while preserving its meaning and effectiveness, to optimize token usage and improve

Prompt Decomposition: Definition and Examples

Technique of breaking down a complex task into several simpler and more targeted sub-prompts, in order to obtain more precise and reliable responses from the LLM.

Prompt Engineering: Definition and Examples

Prompt engineering is the art and science of formulating precise and structured instructions to get the best possible results from a generative AI model.

Prompt Ensembling: Definition and Examples

Technique of submitting multiple variants of the same prompt to an AI model, then aggregating or comparing the responses to produce a more reliable result.

Get new prompts every week

Join our newsletter.