AI Governance

Constitutional AI

An alignment approach developed by Anthropic where AI models are guided by a set of principles (a 'constitution') that help them self-evaluate and improve their responses without relying solely on human feedback.

Why It Matters

Constitutional AI offers a scalable alternative to RLHF — instead of needing human raters for everything, the model can self-correct based on clear principles.

Example

A model generating a response, then evaluating it against principles like 'Is this helpful? Is this honest? Could this cause harm?' and revising accordingly.

Think of it like...

Like a person with a strong moral compass who self-corrects — they do not need someone watching over their shoulder because their principles guide their behavior.

Constitutional AI

Why It Matters

Example

Think of it like...

Related Terms

Alignment

RLHF

AI Safety

Anthropic