Constitutional AI: Definition and Examples
AI alignment method developed by Anthropic, where a model is trained to self-correct by following a set of written principles (a 'constitution') rather than relying solely on human feedback.
Full definition
Constitutional AI (CAI) is an alignment approach for language models introduced by Anthropic in 2022. Its fundamental principle is to equip an AI model with a set of explicit rules—called a 'constitution'—that guide its behavior. These principles cover values such as honesty, helpfulness, harmlessness, and respect for fundamental rights. Concretely, the process takes place in two phases. In the first phase (critique and revision), the model generates responses and then self-evaluates by referring to the constitutional principles. It identifies potential violations and produces a revised version of its response. This critique-revision cycle can be repeated several times to refine quality. In the second phase, the response pairs (original vs. revised) are used to train a reward model via RLAIF (Reinforcement Learning from AI Feedback). This reward model partially replaces direct human feedback, making the process more scalable while maintaining a high level of alignment. The major advantage of Constitutional AI is transparency: the rules are explicit and auditable, unlike the implicit preferences captured by classical RLHF. It also allows for public debate on the values encoded in the system and modification without fully retraining the model.
Etymology
The term 'Constitutional AI' directly refers to the concept of a constitution in the legal and political sense: a foundational document that establishes principles and limits of power. Just as a national constitution defines the rights and duties of citizens and government, the 'constitution' of an AI model defines the ethical and behavioral principles it must follow.
Concrete examples
Training an AI assistant to refuse dangerous requests while remaining helpful
Critique this response according to the following principle: 'The assistant must never help create weapons or dangerous substances'. Does the response contain violations? If so, rewrite it.
Self-evaluation of a model on the honesty of its responses
Based on the principle 'The assistant should acknowledge the limits of its knowledge rather than invent information', evaluate whether your previous response is compliant and propose an improved version.
Designing a transparent and auditable content moderation system
Here is our moderation constitution: 1) No hate speech 2) No medical disinformation 3) Protection of minors. Evaluate this content according to each principle and justify your decision.
Practical usage
In prompt engineering, the principles of Constitutional AI apply by creating explicit instructions (system prompts) that define the assistant's limits and values. You can ask the model to self-criticize according to precise rules before delivering its final response. This approach is particularly useful for building reliable AI applications where transparency of behavior rules is essential.
Related concepts
FAQ
What is the difference between Constitutional AI and RLHF?
Who invented Constitutional AI?
Can the principles of Constitutional AI be applied in one's own prompts?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Context Window: Definition and Examples
The context window refers to the maximum amount of text a language model can process at one time, encompassing both the user input and the generated response.
Cursor: Definition and Overview of the AI Editor
Understand Cursor: AI-native code editor based on VS Code. Differences with Claude Code, GitHub Copilot, and Windsurf, concrete use cases.
Custom GPT: Definition and How to Create Your Own
Understand OpenAI's Custom GPTs: pre-configured ChatGPT assistants. Step-by-step creation, differences with Claude Skills and Gemini Gems.
Datasheets For Datasets: Definition and Examples
Methodology proposing systematic documentation of datasets used in artificial intelligence, akin to technical datasheets accompanying electronic components.
Deepfake: Definition and Examples
Synthetic content (video, audio, or image) generated by artificial intelligence, capable of realistically reproducing the appearance, voice, or expressions
Diffusion: Definition and Examples
Family of generative models that create data (images, audio, video) by learning to reverse a progressive noising process, transforming random noise into coherent content step by step.
Get new prompts every week
Join our newsletter.