Machine Learning

Preference Optimization

Training techniques that directly optimize models based on human preference data, where humans indicate which of two model outputs they prefer.

Why It Matters

Preference optimization is how models learn to be helpful, honest, and harmless. It translates subjective human judgment into mathematical optimization.

Example

Showing raters two model responses to the same question, collecting their preference, then training the model to produce outputs more like the preferred ones.

Think of it like...

Like a cooking competition where judges taste two dishes and pick the better one — over many rounds, the chef learns to cook what judges prefer.

Preference Optimization

Why It Matters

Example

Think of it like...

Related Terms

RLHF

DPO

Reward Model

Alignment

Human Evaluation