Which description is most accurate for a Slack incident channel update?
The correct answer accurately reports the failed stage, the number and percentage of failing tests, and the duration. All three are essential for triage.
Key CI/CD vocabulary: • stage — a discrete phase in a pipeline (lint → build → unit-tests → integration-tests → deploy) • failure rate — the percentage of tests that failed: 3 / 847 × 100 = 0.35% • duration — how long the build took to run • triage — the process of identifying and prioritising problems
Why the distractors fail: • "All 847 tests broken" — factually wrong; only 3 failed • "Took too long" — 4m 12s is completely normal for CI; this isn't a timeout failure • "No action needed" — a 0.35% failure rate is NOT automatically acceptable; any pipeline failure blocks deployment and requires investigation
CI summary vocabulary: failed stage, test suite, failure count, failure rate, build duration, blocking PR/deploy, flaky test, red build, green build
What is the most precise way to describe this failure to the team?
Timeout means the stage ran but exceeded the maximum allowed time, causing it to be killed. This is different from a test failure (where the code runs to completion but produces wrong output).
Key vocabulary: • timeout — a stage was terminated because it exceeded its time limit • completed successfully / passed — stage ran without errors • e2e tests (end-to-end) — tests that simulate full user flows through the application • flaky test — a test that sometimes passes and sometimes fails with the same code (often causes timeouts in e2e)
Why option C is wrong: You cannot conclude there's a "bug in the test code" from a timeout alone. Possible causes: test environment was slow, a dependency was down, a network call hung, or the test config's timeout limit is too low.
Root cause analysis phrase: "The e2e-tests stage timed out at 8m 32s — this could indicate a slow test environment, a hung network call, or an insufficiently high timeout threshold. Checking logs to narrow down the cause."
3 / 5
Over the past 30 days, a team's CI pipeline shows:
Total builds: 847 Passed: 724 Failed: 123 Mean build time: 6m 14s
How would you present this at a sprint retrospective?
What makes this answer effective: 1. States the success rate AND failure rate (both perspectives) 2. Translates to plain English ("1 in 7 builds") 3. Includes the mean build time (team efficiency metric) 4. Flags it as a concern WITHOUT over-alarming — uses "worth investigating" 5. Names possible root causes (flaky tests, environment instability)
Key retrospective vocabulary: • failure rate — percentage of failed builds • success rate — percentage of successful builds (100% − failure rate) • mean build time — average duration across all builds • flaky test — intermittent test failure, not caused by code changes • environment instability — the CI infrastructure itself causes failures
Industry context: Elite software teams (DORA metrics) target <15% pipeline failure rate. 14.5% is on the boundary — worth attention but not a crisis.
What are the TWO separate problems this log line reports?
Two separate failure modes:
1. Test failures: 12 tests failed. Exit code 1 means the process exited with an error (exit code 0 = success, anything else = failure).
2. Coverage threshold breach: Code coverage of 67.3% is below the configured 80% minimum. Many CI configs enforce a minimum coverage threshold — failing to meet it automatically fails the build, even if all tests pass.
Key vocabulary: • exit code — the return value of a process: 0 = success, non-zero = failure • code coverage — the percentage of code lines executed by the test suite • coverage threshold — the minimum coverage percentage required to pass the build • skipped tests — tests marked with `.skip()` or similar that are not executed • independently fail the build — either issue alone would cause the build to fail
Common coverage vocabulary: "Coverage dropped from 84% to 67% — we need to add tests for the new authentication module before this can merge."
5 / 5
A tech lead reviews pipeline trends and sees this week-over-week comparison:
What this pattern suggests: When build time AND failure rate increase simultaneously, there's often a single root cause — typically a new test suite, slow integration test, or flaky external dependency added that week.
Key analytical vocabulary: • percentage points — the absolute difference between two percentages (94% → 81% = 13 pp; NOT 13% — that would mean 94 × 0.87) • week-over-week (WoW) — comparison between the current week and the previous week • correlation — two metrics moving together (build time + failure rate) • root cause — the underlying source of multiple observed problems
Precision note: Always say "percentage points" when comparing two percentages. Saying "success rate dropped 13%" is ambiguous — it could mean 94% × 0.87 = 81.8% (multiplicative) or 94% − 13% = 81% (additive). "13 percentage points" is always unambiguous.