Artificial Intelligence

Guardrail Model

A separate, specialized AI model that monitors the inputs and outputs of a primary LLM to detect and block harmful, off-topic, or policy-violating content.

Why It Matters

Guardrail models add a safety layer independent of the main model. Even if the primary model is tricked, the guardrail can catch policy violations.

Example

A guardrail model scanning every user input for prompt injection attempts and every model output for harmful content, blocking anything that violates policies.

Think of it like...

Like a security guard at a building entrance — separate from the building staff, specifically trained to spot threats and prevent unauthorized access.

Related Terms