5 exercises — choose and evaluate SLA clauses on uptime commitments, incident response tiers, service credits, and professional support descriptions.
SLA writing principles
Be specific: "within 15 minutes" > "as quickly as possible"
Be measurable: every clause should have a number that can be tracked
State consequences: what happens when you miss the SLA? (Service Credits)
Avoid 100% uptime: never promise what you can't deliver. No SLA is 100%.
Define the period: monthly vs. annual measurements lead to very different outcomes
0 / 5 completed
1 / 5
Your team is writing an SLA for a payment processing API used by enterprise e-commerce clients. Any downtime directly causes lost transactions. Which SLA clause most appropriately commits to the required availability tier?
99.99% uptime (four nines) is the appropriate tier for payment processing.
Why not 99% or 99.9%? • 99% allows ~7.3 hours/month of downtime — for a payment API, that's potentially thousands of failed transactions per hour. Totally unacceptable for enterprise clients. • 99.9% allows ~43.8 minutes/month — still too high for a critical financial path. Most enterprise payment contracts require better than this.
Why not 100%? 100% uptime is not achievable in practice — hardware fails, networks partition, deployments require restarts. Never put 100% uptime in an SLA. Even the most sophisticated hyperscalers (AWS, Google Cloud) do not offer 100% SLA commitments. A 100% uptime clause: • Exposes you to unlimited liability for any fleeting disruption • Is contractually unenforceable as written • Signals engineering naivety to sophisticated buyers
Industry standard for payment processing: 99.99% (four nines) = ~52.6 minutes/year. This is the baseline expectation in PCI-DSS environments and enterprise payment contracts.
The correct clause pattern: "[Availability %] of the time in any [calendar month / rolling 30-day period]" + the specific downtime calculation demonstrates you understand the commitment you're making.
2 / 5
Your team is writing SLA language for incident response times for a critical (P1) production outage. Which clause is the most complete and professionally worded?
"Initial response within 15 minutes, status update every 30 minutes, target resolution within 4 hours."
A well-written SLA incident response clause has three distinct commitments:
① Initial response time — how quickly someone acknowledges the incident ("15 minutes"). This is the hardest commitment because it means the on-call engineer must be available 24/7. ② Communication cadence — how often stakeholders receive updates while the incident is active ("every 30 minutes"). This is crucial: customers want to know you're working on it, even if there's no resolution yet. ③ Target resolution time — how long until the service is restored ("4 hours"). Note: this is usually a target, not a hard guarantee, because incident complexity varies.
Why "as quickly as possible" is bad SLA language: "As quickly as possible" is not a commitment — it's a statement of intent. It sets no measurable obligation and cannot trigger any service credit or contractual consequence. Clients cannot plan around it.
Why "24/7/365 support" isn't enough: This states availability, not response speed. "Available" and "will respond in 15 minutes" are very different commitments.
Pattern to know: P1 → 15 min / P2 → 1 hour / P3 → 4 hours / P4 → 1 business day is a common SLA tiering structure.
3 / 5
Your organisation has a 99.9% ("three nines") uptime SLA with a customer. In November, the service was down for 2 hours and 15 minutes. Which sentence correctly describes whether the SLA was met?
The SLA was breached. 135 minutes of downtime exceeded the 43.8-minute monthly allowance.
The maths: • Monthly minutes: 30 days × 24 hours × 60 min = 43,200 minutes • 99.9% uptime → 0.1% allowed downtime = 0.001 × 43,200 = 43.2 minutes/month • Actual downtime: 2h 15min = 135 minutes • Breach: 135 − 43 = 92 minutes over budget
Why option C is wrong: SLAs are typically measured per calendar month (or per rolling 30-day period), not annually. Even if the annual budget (8.76 hours/year) appears to have remaining headroom, a 2h15min incident in a single month still breaches the monthly commitment.
Why "users affected" is irrelevant to availability SLAs: Standard uptime SLAs are measured by service availability, not the number of users impacted. Separate clauses may address data loss or degraded performance for specific user counts, but availability is binary: up or down.
Writing an SLA breach notification: "Our service availability for November fell to 99.69%, below our committed 99.9% SLA. Total unplanned downtime: 135 minutes (SLA allowance: 43 minutes). We will apply service credits as outlined in Section 4.2 of your contract."
4 / 5
A contract includes the following SLA clause: "In the event of a Service availability breach, Customer shall be entitled to Service Credits equal to 10% of the monthly fee for each full hour of downtime exceeding the SLA threshold, up to a maximum of 30% of the monthly fee."
The monthly fee is $5,000. The service was down for 3 hours beyond the SLA threshold. How much in Service Credits is the customer entitled to?
$1,500 — 10% × $5,000 × 3 hours = $1,500. And the 30% cap ($1,500) doesn't reduce it further.
The calculation: ① Per-hour credit: 10% of $5,000 = $500 per hour of excess downtime ② Total excess downtime: 3 full hours ③ Credit: $500 × 3 = $1,500 ④ Maximum cap: 30% of $5,000 = $1,500 ⑤ Result: $1,500 (the calculation and the cap arrive at the same number here)
Key SLA credit vocabulary: • "Service Credits" — also called "SLA credits" — financial compensation deducted from the next invoice; not a cash refund • "for each full hour" — partial hours don't count; if excess downtime was 3h 45min, only 3 full hours are counted • "up to a maximum" — a liability cap; protects the vendor from unlimited credit exposure • "exceeding the SLA threshold" — credits only apply to time above the contractual allowance; the first 43 minutes of downtime in a 99.9% SLA month don't count
Tip: Service Credits are not refunds. They reduce future invoices. Always check whether credits auto-apply or require a formal written claim within a defined window (e.g., "Customer must request credits within 30 days of the incident").
5 / 5
Your company offers three support tiers in its SLA. You need to write the description for the Enterprise tier. Which paragraph most accurately and professionally describes a premium SLA support tier?
Option B is the only professionally written, complete SLA tier description.
Why it works: A well-written SLA tier description must be:
① Specific and measurable: "P1 initial response: 15 minutes" is a commitment. "Try our hardest" (option A) is not. ② Comprehensive: Lists all components — availability %, response times, support channels, account management, review cadence. Tier descriptions need to tell clients exactly what they're paying for. ③ Consequences stated: "Service Credits apply for any breach" — tells the customer what happens when you miss the commitment ④ Named format: Starting with a bold tier name followed by a structured list is the industry standard format for SLA tier documents
Why option A fails: "Best support", "try our hardest", "as quickly as we can" — all of these are subjective and unenforceable. This is marketing copy, not an SLA.
Why option D fails: "Always resolved within 1 hour" for P1 incidents is an unrealistic guarantee that most organisations cannot meet. And "full money-back guarantee" suggests a full refund of all fees — courts and procurement teams will scrutinise this; it creates enormous financial exposure.
Key elements of a complete SLA tier description: • Uptime commitment (%) • Response time by severity (P1/P2/P3...) • Support hours (business hours / 24×7) • Channel (phone / email / dedicated Slack) • Account management (shared CSM / dedicated CSM) • Review frequency • Credit/remedy clause reference