Artificial Intelligence

AI Alignment Tax

The performance cost of making AI models safer and more aligned with human values. Safety training sometimes reduces raw capability on certain tasks.

Why It Matters

The alignment tax challenges the narrative that safety and capability are always in tension. Reducing this tax is a key research goal.

Example

A model that scores slightly lower on coding benchmarks after RLHF safety training, because it now refuses to generate malicious code that the benchmark counts as correct.

Think of it like...

Like the fuel economy cost of adding safety features to a car — airbags and seatbelts add weight, slightly reducing performance, but the safety is worth it.

Related Terms