Machine Learning

Concept Bottleneck

A model architecture that forces predictions through a set of human-interpretable concepts. The model first predicts concepts, then uses those concepts to make the final prediction.

Why It Matters

Concept bottleneck models are inherently interpretable — you can see exactly which concepts drove the prediction and intervene by correcting wrong concepts.

Example

A bird species classifier that first predicts interpretable concepts (wing_color=red, beak_shape=curved, size=small) then uses those concepts to predict the species.

Think of it like...

Like a doctor who first identifies symptoms (fever, cough, fatigue) then diagnoses the disease — the intermediate concepts make the reasoning transparent.

Related Terms