Claude — Constitutional AI
The Employee with a Code of Ethics
8 min read
The Employee with a Code of Ethics
Some employees follow rules because they'll get fired if they don't. The best ones follow principles because they believe in them.
Anthropic built Claude with a Constitution — a set of explicit principles about being helpful, harmless, and honest. Instead of only learning from human ratings (like RLHF), Claude also critiques its own responses against these principles and revises them. It's like an employee who doesn't just follow manager feedback but has genuinely internalised a professional ethics code.
In Plain English
Claude is built by Anthropic using Constitutional AI — a training approach where the model learns to critique and revise its own responses against a set of stated principles, not just human ratings. This makes Claude particularly strong at nuanced, thoughtful, and safety-conscious responses.
The Technical Picture
Constitutional AI (CAI) extends RLHF by having the model self-critique responses against a written constitution of principles, generating revised responses, then training a preference model on these AI-generated comparisons. This reduces reliance on human labellers for safety-critical judgements.
Real-World Examples
- Claude is widely used for long-form writing, analysis, and coding tasks requiring careful reasoning
- Anthropic's Claude API powers applications requiring safety-critical AI responses
- Claude's 200K context window enables reading entire books in a single conversation
Claude was built to have values, not just rules — Constitutional AI is what makes it distinctly careful and thoughtful.