ReLU
Rectified Linear Unit — the most commonly used activation function in deep learning. It outputs the input directly if positive, and zero otherwise: f(x) = max(0, x).
Why It Matters
ReLU solved the vanishing gradient problem and enabled training of much deeper networks, catalyzing the deep learning revolution.
Example
In a neuron with ReLU: input of 5 outputs 5, input of -3 outputs 0. It simply clips negative values to zero.
Think of it like...
Like a one-way valve in plumbing — it lets positive flow through unchanged but blocks anything negative.
Related Terms
Activation Function
A mathematical function applied to the output of each neuron in a neural network that introduces non-linearity. Without activation functions, a neural network would just be a series of linear transformations.
Sigmoid
An activation function that squashes input values into a range between 0 and 1, creating an S-shaped curve. It is commonly used for binary classification outputs and in certain neural network architectures.
Vanishing Gradient Problem
A training difficulty in deep networks where gradients become exponentially smaller as they are propagated back through many layers, making it nearly impossible for early layers to learn.