intermediate

Claude — Constitutional AI

The Employee with a Code of Ethics

8 min read

The Analogy

The Employee with a Code of Ethics

Some employees follow rules because they'll get fired if they don't. The best ones follow principles because they believe in them.

Anthropic built Claude with a Constitution — a set of explicit principles about being helpful, harmless, and honest. Instead of only learning from human ratings (like RLHF), Claude also critiques its own responses against these principles and revises them. It's like an employee who doesn't just follow manager feedback but has genuinely internalised a professional ethics code.

In Plain English

Claude is built by Anthropic using Constitutional AI — a training approach where the model learns to critique and revise its own responses against a set of stated principles, not just human ratings. This makes Claude particularly strong at nuanced, thoughtful, and safety-conscious responses.

The Technical Picture

Constitutional AI (CAI) extends RLHF by having the model self-critique responses against a written constitution of principles, generating revised responses, then training a preference model on these AI-generated comparisons. This reduces reliance on human labellers for safety-critical judgements.

Real-World Examples

Claude is widely used for long-form writing, analysis, and coding tasks requiring careful reasoning
Anthropic's Claude API powers applications requiring safety-critical AI responses
Claude's 200K context window enables reading entire books in a single conversation

Key Takeaway

Claude was built to have values, not just rules — Constitutional AI is what makes it distinctly careful and thoughtful.

Related Concepts

RLHF — Reinforcement Learning from Human Feedback

ChatGPT — The Generative Pre-Trained Transformer

System Prompts

AI Thinkers — Overview

Back to All Concepts