Machine Learning

Long Short-Term Memory

A type of recurrent neural network designed to learn long-term dependencies through special gating mechanisms that control information flow. LSTMs address the vanishing gradient problem of standard RNNs.

Why It Matters

LSTMs were the dominant architecture for sequence tasks before transformers. Understanding them provides context for why transformers were such a breakthrough.

Example

An LSTM processing a paragraph where a character's name mentioned in the first sentence is needed to understand a pronoun in the last sentence — maintaining that memory across the gap.

Think of it like...

Like a note-taking system with three decisions at each step: what to forget from your notes, what to add from new information, and what to share as your current understanding.

Long Short-Term Memory

Why It Matters

Example

Think of it like...

Related Terms

Recurrent Neural Network

GRU

Vanishing Gradient Problem

Transformer

Sequence-to-Sequence