On-Call Vocabulary
3 exercises — master the essential metrics and terms every on-call engineer needs: SLI/SLO/SLA, MTTR/MTTD, and severity levels.
0 / 3 completed
Quick reference: on-call metrics
- SLI — what you measure (e.g. error rate, latency)
- SLO — your internal target (e.g. 99.9% success rate)
- SLA — the contractual commitment to customers (lower than SLO)
- MTTD — time from failure start → first alert / detection
- MTTR — time from failure start → full recovery
- MTBF — average time between incidents (higher = more stable)
- P0–P4 — lower number = more severe (P0 = all-hands, P4 = cosmetic)
1 / 3
A product manager asks: "What's the difference between an SLA, SLO, and SLI?" Which definition set is correct?
Option A is the correct industry-standard definition.
SLI (Service Level Indicator) — the raw metric you measure. Examples: request success rate, latency p99, error rate, availability %.
SLO (Service Level Objective) — your internal target for the SLI. Example: "99.9% of requests must succeed per rolling 30 days." SLOs are set by engineering teams and are aspirational. They are how you decide whether your service is "healthy".
SLA (Service Level Agreement) — a legal/contractual commitment to a customer. If you breach an SLA, there are consequences (refunds, escalation, etc.). SLAs are almost always set lower than your internal SLO to give you a buffer.
Relationship: SLI is measured → compared against SLO → if SLO is breached repeatedly → SLA may be breached → penalties apply.
SLI (Service Level Indicator) — the raw metric you measure. Examples: request success rate, latency p99, error rate, availability %.
SLO (Service Level Objective) — your internal target for the SLI. Example: "99.9% of requests must succeed per rolling 30 days." SLOs are set by engineering teams and are aspirational. They are how you decide whether your service is "healthy".
SLA (Service Level Agreement) — a legal/contractual commitment to a customer. If you breach an SLA, there are consequences (refunds, escalation, etc.). SLAs are almost always set lower than your internal SLO to give you a buffer.
Relationship: SLI is measured → compared against SLO → if SLO is breached repeatedly → SLA may be breached → penalties apply.
Explore more: AI Prompting exercises →