Artificial Intelligence

Guardrail Model

A separate, specialized AI model that monitors the inputs and outputs of a primary LLM to detect and block harmful, off-topic, or policy-violating content.

Why It Matters

Guardrail models add a safety layer independent of the main model. Even if the primary model is tricked, the guardrail can catch policy violations.

Example

A guardrail model scanning every user input for prompt injection attempts and every model output for harmful content, blocking anything that violates policies.

Think of it like...

Like a security guard at a building entrance — separate from the building staff, specifically trained to spot threats and prevent unauthorized access.

Guardrail Model

Why It Matters

Example

Think of it like...

Related Terms

Guardrails

AI Safety

Content Moderation

Prompt Injection Defense

Classification