Artificial Intelligence

Mixture of Experts

An architecture where a model consists of multiple specialized sub-networks (experts) and a gating mechanism that routes each input to only the most relevant experts. Only a fraction of the total parameters are active per input.

Why It Matters

MoE enables models with massive total parameter counts that are efficient to run because only a subset activates per query. GPT-4 is rumored to use MoE.

Example

A model with 8 expert networks where each input activates only 2 — the model has the knowledge of all 8 experts but the computational cost of just 2.

Think of it like...

Like a hospital with many specialists — you do not see every doctor for every visit, a triage nurse (the gating mechanism) routes you to the right specialists for your condition.

Related Terms