English for DevOps Engineers: Essential Vocabulary

DevOps has its own dense vocabulary — and much of it only makes sense with cultural context. Understanding the terminology helps you read documentation faster, contribute to incident postmortems, and communicate clearly with SREs, platform engineers, and developers.

Here are the 35 most used DevOps and cloud-native terms, grouped into the workflows where you encounter them.

CI/CD: The Automation Pipeline

Pipeline
An automated sequence of stages — build, test, deploy — executed on every code change. Described as: “The pipeline failed at the integration test stage.”

Stage / Job / Step
Levels of granularity in a pipeline. A pipeline has stages (e.g. build, test, deploy). Each stage has jobs (e.g. unit-tests, lint, security-scan). Each job has steps (e.g. run this script).

Artifact
A file produced by the build process and passed to later pipeline stages. Examples: a compiled binary, a Docker image, a test report, a .zip of the application. Phrase: “The Docker image artifact is published to the container registry.”

Trigger
The event that starts a pipeline run: a push to a branch, a pull request, a scheduled cron job, or a manual trigger.

Rollout / Roll back
Rollout = the process of deploying a new version (gradually or all at once). Rollback = reverting to a previous version. Phrase: “We detected high error rates during the rollout and rolled back to v2.3.”

Deployment Strategies

Blue-green deployment
Two identical production environments. The live environment (blue) gets traffic while the new version deploys to the idle environment (green). Traffic switches when green is verified. Instant rollback: flip the switch.

Canary deployment
A small percentage of traffic (often 1–5%) is routed to the new version before a full rollout. If metrics look healthy, the rollout continues; otherwise, traffic returns to the stable version.

Feature flag / Feature toggle
A configuration switch that enables or disables a feature at runtime, without deploying new code. Phrase: “We shipped the new checkout flow behind a feature flag — it’s only visible to 10% of users.”

Zero-downtime deployment
A deployment strategy that ensures the service remains available throughout the release process. Achieved via blue-green, canary, or rolling deployments.

Rolling deployment
Instances of the application are updated one by one (or in batches) without taking all of them offline simultaneously. Common in Kubernetes.

Containers & Orchestration

Container / Docker image
A container packages an application and its dependencies into a lightweight, portable unit. A Docker image is the blueprint; a container is a running instance of that image.

Container registry
A storage and distribution system for Docker images. Examples: Docker Hub, Amazon ECR, Google Artifact Registry. Phrase: “Push the image to the registry, then the deployment pipeline pulls it.”

Orchestration
Automated management of containers at scale: scheduling, scaling, healing, networking. Kubernetes is the dominant orchestration platform.

Pod
The smallest deployable unit in Kubernetes — a group of one or more containers sharing network and storage. Phrase: “The pod is in CrashLoopBackOff — check the container logs.”

Namespace
A virtual cluster within a Kubernetes cluster, used to isolate environments (e.g. dev, staging, production). Phrase: “Deploy to the staging namespace first.”

Helm chart
A package of Kubernetes manifests with templating, used to deploy and configure applications. Phrase: “We use a Helm chart for the service — override the image tag in values.yaml.”

Infrastructure as Code

IaC (Infrastructure as Code)
Managing infrastructure through machine-readable config files rather than manual processes. Enables version control, reproducibility, and peer review of infrastructure changes.

Terraform
The most popular IaC tool, using HCL (HashiCorp Configuration Language) to declaratively define cloud resources.

State / Drift
Terraform maintains a state file tracking what it deployed. Drift occurs when the real infrastructure diverges from the state — usually from manual changes bypassing IaC. Phrase: “There’s drift in the load balancer config — someone changed it manually.”

Module
A reusable, parameterised set of Terraform resources. Phrase: “We have an internal module for VPCs — use that instead of writing it from scratch.”

Observability

SLI / SLO / SLA

SLI (Service Level Indicator): the metric measured (e.g. request success rate)
SLO (Service Level Objective): the target (e.g. 99.9% success rate over 30 days)
SLA (Service Level Agreement): the contractual commitment to customers

Phrase: “We’re burning through our error budget — our SLO is 99.9% but we’ve been at 99.5% this week.”

Error budget
The tolerable amount of unreliability implied by an SLO. If the SLO is 99.9%, the error budget is 0.1% of requests or time. Phrase: “We need to freeze feature work — the error budget is exhausted this month.”

Golden signals
Google SRE’s four key metrics for any service: Latency (how long requests take), Traffic (requests per second), Errors (error rate), Saturation (how “full” the system is). The RED method (Rate, Errors, Duration) is similar.

Trace / Distributed tracing
End-to-end tracking of a single request as it flows through multiple services. Tools: Jaeger, Zipkin, OpenTelemetry, AWS X-Ray. Phrase: “The trace shows the latency spike is in the authentication service, not the database.”

Incident Management

On-call
The rotation of engineers responsible for responding to production incidents outside business hours. Phrase: “I’m on-call this week — I’ll be paged if anything goes wrong.”

Page / Alert
A notification sent to the on-call engineer when a metric breaches a threshold. Tools: PagerDuty, OpsGenie. Phrase: “I got paged at 2am — the error rate spiked to 40%.”

Runbook
A documented set of procedures for handling known incidents or routine operational tasks. Phrase: “There’s a runbook for this — follow the steps in Confluence under ‘Database failover’.”

Postmortem / Incident review
A blameless analysis of an incident: timeline, root cause, contributing factors, and action items. Phrase: “The postmortem is on Friday — please add your observations to the shared doc before then.”

RCA (Root Cause Analysis)
The process of identifying the fundamental cause of an incident. Phrase: “The RCA identified a race condition in the auth service introduced in last Thursday’s deploy.”

MTTR (Mean Time to Recovery)
The average time between discovering an incident and restoring normal service. A key reliability metric.

Working With These Terms

Reading is not enough — use these terms when you write documentation, Slack messages, and PR descriptions. The more you write “the pipeline failed at the build stage” or “we deployed behind a feature flag”, the more naturally the vocabulary comes in spoken standups and incident calls.

Try our DevOps vocabulary exercise to test your understanding in context.