Sparse Model
A neural network where most parameters are zero or inactive for any given input. Sparse models achieve high capacity with lower computational cost by only using relevant parameters.
Why It Matters
Sparsity enables building models with massive total knowledge that are still efficient to run — the key insight behind Mixture of Experts architectures.
Example
A model with 1 trillion total parameters where only 100 billion are active for any single input — massive knowledge, manageable compute.
Think of it like...
Like a university with thousands of professors but each student only attends classes from the 20 most relevant to their major — total knowledge is vast but individual cost is manageable.
Related Terms
Mixture of Experts
An architecture where a model consists of multiple specialized sub-networks (experts) and a gating mechanism that routes each input to only the most relevant experts. Only a fraction of the total parameters are active per input.
Pruning
A model compression technique that removes unnecessary or redundant weights, neurons, or layers from a trained neural network. Like pruning a plant, it removes parts that are not contributing to overall health.
Parameter
Any learnable value in a machine learning model that is adjusted during training. Parameters include weights and biases in neural networks. Model size is often described by parameter count.