AI Governance

Constitutional AI

An alignment approach developed by Anthropic where AI models are guided by a set of principles (a 'constitution') that help them self-evaluate and improve their responses without relying solely on human feedback.

Why It Matters

Constitutional AI offers a scalable alternative to RLHF — instead of needing human raters for everything, the model can self-correct based on clear principles.

Example

A model generating a response, then evaluating it against principles like 'Is this helpful? Is this honest? Could this cause harm?' and revising accordingly.

Think of it like...

Like a person with a strong moral compass who self-corrects — they do not need someone watching over their shoulder because their principles guide their behavior.

Related Terms