Data Science

Data Pipeline

An automated workflow that extracts data from sources, transforms it through processing steps, and loads it into a destination for use. In ML, data pipelines ensure consistent data flow from raw sources to model training.

Why It Matters

Data pipelines are the plumbing of AI systems. A broken pipeline means no fresh data, stale models, and degraded performance — often without anyone noticing.

Example

An ETL pipeline that hourly extracts customer data from a CRM, joins it with transaction data from a database, cleans it, and loads it into a feature store for model training.

Think of it like...

Like a factory assembly line — raw materials enter one end, pass through processing stations, and finished products emerge at the other end, all running automatically.

Data Pipeline

Why It Matters

Example

Think of it like...

Related Terms

ETL

Data Engineering

Feature Store

MLOps

Data Preprocessing