Agent Observability
5 exercises — master the vocabulary of making AI agent systems visible and debuggable: traces, spans, LangSmith, LangFuse, token budgets, and span-level analysis.
0 / 5 completed
Agent observability vocabulary quick reference
- Trace — the complete record of one agent run (all steps, costs, inputs/outputs)
- Span — a single step within the trace (one LLM call, one tool call, etc.)
- LLM span — span recording one LLM call (prompt, completion, tokens, latency)
- Token budget — max tokens a run is allowed to consume (hard or soft limit)
- Budget-exceeded status — run terminated by resource limit, not task completion
- LangSmith / LangFuse — LLM observability platforms for tracing, eval, and cost analysis
- Span-level token analysis — inspecting token counts per step to find optimisation targets
1 / 5
In agent observability platforms like LangSmith or LangFuse, what does a "trace" represent?
A trace is the observability unit of an entire agent run — the full story of what happened from the first input to the final output.
What a trace contains:
① Run metadata — session ID, start time, duration, total cost, model used
② Input — the user's initial request
③ All spans — every step taken during the run (LLM calls, tool calls, sub-agent calls)
④ Intermediate outputs — Thoughts, tool results, partial answers at each step
⑤ Final output — the agent's response to the user
⑥ Feedback (if collected) — human ratings, automated eval scores
Why traces are the fundamental observability unit:
• A single user interaction might involve 15+ LLM calls and 30+ tool calls — you need the trace to understand the full picture
• Unlike simple logs (text), traces are structured and queryable — you can filter by model, cost, step count, outcome
• Traces can be replayed, compared, and used as training examples
Key vocabulary:
• Trace ID — unique identifier for one complete agent run
• Run — synonym for trace in some frameworks (LangChain uses "run")
• Trace viewer — the UI that renders a tree of all steps in a trace
• Trace dataset — a collection of traces used for offline evaluation
What a trace contains:
① Run metadata — session ID, start time, duration, total cost, model used
② Input — the user's initial request
③ All spans — every step taken during the run (LLM calls, tool calls, sub-agent calls)
④ Intermediate outputs — Thoughts, tool results, partial answers at each step
⑤ Final output — the agent's response to the user
⑥ Feedback (if collected) — human ratings, automated eval scores
Why traces are the fundamental observability unit:
• A single user interaction might involve 15+ LLM calls and 30+ tool calls — you need the trace to understand the full picture
• Unlike simple logs (text), traces are structured and queryable — you can filter by model, cost, step count, outcome
• Traces can be replayed, compared, and used as training examples
Key vocabulary:
• Trace ID — unique identifier for one complete agent run
• Run — synonym for trace in some frameworks (LangChain uses "run")
• Trace viewer — the UI that renders a tree of all steps in a trace
• Trace dataset — a collection of traces used for offline evaluation