Advanced AI Agents #guardrails #safety #HITL #prompt-injection

Agent Guardrails & Safety

5 exercises — master the vocabulary of safe agentic systems: input and output guardrails, human-in-the-loop patterns, runaway tool call prevention, and the complete guardrail stack.

0 / 5 completed
Agent guardrails vocabulary quick reference
  • Input guardrails — safety checks on user input before it reaches the agent (prompt injection, PII, policy)
  • Output guardrails — safety checks on agent output before it reaches the user (harm, PII leakage, hallucinations)
  • Prompt injection — user input that attempts to override agent instructions
  • Human-in-the-loop (HITL) — agent pauses for human approval before high-risk/irreversible actions
  • Runaway tool calls — agent in loop making unbounded tool calls; causes cost/data risk
  • Max-calls limit — guardrail capping total tool invocations per run
  • Guardrail stack — layered combination of input, loop, output, and HITL guardrails
1 / 5

What are "input guardrails" in an AI agent system?