DevOps Vocabulary: 40 Must-Know Terms Explained

DevOps has its own language. If you are working as a developer, SRE, platform engineer, or QA engineer — or simply trying to follow a technical discussion that involves deployment pipelines — this vocabulary is essential.

This article explains 40 key DevOps terms, grouped by category, with example sentences showing how each term is actually used in conversation and documentation.

CI/CD Pipeline

Pipeline The automated sequence of steps that takes code from a commit to production. Each step is called a stage, and the stages run in order.

“The pipeline failed at the test stage — it looks like a missing environment variable.”

Stage A distinct phase within a pipeline: build, test, deploy, and so on. Stages can be sequential or parallel.

“We run the unit tests and integration tests in parallel stages to reduce pipeline runtime.”

Artifact A file or set of files produced by a build stage and passed to later stages. Docker images, compiled binaries, and coverage reports are all artifacts.

“The build stage produces a Docker artifact that gets pushed to the registry.”

Trigger The event that starts a pipeline run. Common triggers include a push to a branch, a merged pull request, or a scheduled time.

“We configured a trigger on every push to the main branch.”

Runner The machine or container that executes a pipeline job. Runners can be self-hosted or provided by the CI platform.

“The job is queued — we might need to add another runner to speed things up.”

Webhook An HTTP callback sent by one system to another when an event occurs. GitHub sends a webhook to your CI server when you push code.

“Set up a webhook so Slack notifies us when a deployment succeeds.”

Environment A named configuration of infrastructure: typically development, staging, and production. Each environment may have different secrets, database connections, and resource limits.

“This change is live in staging — we’ll promote it to production after QA sign-off.”

Containers and Orchestration

Container A lightweight, isolated runtime environment that packages an application and its dependencies. Containers are standardised, portable, and faster to start than virtual machines.

“We ship our services as containers, so they run the same way in dev, staging, and prod.”

Image A read-only template used to create containers. An image includes the OS base, application code, and dependencies. Images are stored in a registry.

“Tag the image with the commit SHA so we can trace every build back to its source.”

Registry A storage and distribution system for container images. Docker Hub, AWS ECR, and Google Artifact Registry are common examples.

“Push the image to the registry after the build stage, then the deploy stage pulls it.”

Pod The smallest deployable unit in Kubernetes. A pod contains one or more containers that share the same network namespace and storage.

“When the pod crashed, Kubernetes automatically restarted it.”

Cluster A group of nodes (machines) managed by Kubernetes that run containerised workloads together.

“We have a production cluster in us-east-1 and a staging cluster in eu-west-1.”

Namespace A logical partition within a Kubernetes cluster, used to separate environments or teams.

“The payments team deploys everything to the payments namespace to isolate their resources.”

Node A single machine (virtual or physical) in a Kubernetes cluster. The cluster scheduler assigns pods to nodes based on available resources.

“The node is under memory pressure — we should either scale out or increase the node size.”

Orchestration The automated management of containerised applications across multiple nodes: scheduling, scaling, networking, and self-healing.

“Kubernetes handles orchestration so we don’t manually manage where each service runs.”

Monitoring and Reliability

SLA (Service Level Agreement) A contract between a service provider and a customer defining the minimum acceptable service standards — typically uptime percentage, response time, or support response time.

“Our SLA with the client guarantees 99.9% uptime per calendar month.”

SLO (Service Level Objective) An internal target for service reliability, usually stricter than the SLA. The SLO is what you aim for; the SLA is what you promise.

“Our SLO for the checkout API is p95 < 200ms. We’re currently at 215ms — outside target.”

SLI (Service Level Indicator) The metric actually measured to evaluate performance against an SLO. Common SLIs include error rate, latency, and availability.

“The SLI we track for availability is: successful requests ÷ total requests over a rolling 30-day window.”

Error budget The allowed margin of failure before an SLO is breached. If your SLO is 99.9% uptime, your error budget is 0.1% — about 43 minutes of downtime per month.

“We’ve burned through 60% of our error budget this month. No risky deployments until the window resets.”

Alert A notification triggered when a metric exceeds a defined threshold. Alerts can wake engineers during an incident (PagerDuty) or post to Slack for awareness.

“The alert fired because p99 latency crossed 500ms for 5 consecutive minutes.”

On-call A rotation where engineers take responsibility for responding to incidents outside business hours.

“I’m on-call this week, so I need to keep my phone nearby at night.”

MTTR (Mean Time to Recovery) The average time taken to restore service after an incident. Lower is better.

“Our MTTR for P1 incidents is 47 minutes — industry average is around 60.”

MTBF (Mean Time Between Failures) The average time between incidents. Used to measure system reliability.

“Increasing MTBF is a long-term reliability goal — we’re targeting six months between major outages.”

Runbook A documented procedure for performing a specific operational task or responding to a specific incident type.

“Follow the runbook for database failover — it’s in the ops wiki under ‘incidents’.”

Infrastructure as Code (IaC)

IaC (Infrastructure as Code) Managing and provisioning infrastructure through machine-readable configuration files rather than manual processes.

“With IaC, any engineer on the team can spin up a new environment using the same Terraform configs.”

Idempotent An operation that produces the same result whether applied once or many times. IaC tools aim to be idempotent: re-running the apply should not change a system already in the desired state.

“Terraform apply is idempotent — running it again after no changes does nothing.”

Declarative Describing what the infrastructure should look like, not how to build it. Terraform and Kubernetes manifests are declarative.

“The Kubernetes manifest declares the desired state — the control plane figures out how to achieve it.”

Imperative Specifying the steps to execute to reach a desired state. Bash scripts and Ansible playbooks are often imperative.

“The old approach was imperative — twenty shell commands in sequence that had to be run in the right order.”

Drift The difference between the declared infrastructure state and the actual state. Drift occurs when someone makes a manual change that bypasses the IaC tooling.

“We detected drift in the production cluster — someone manually scaled the deployment without updating the config.”

Provisioning The process of setting up infrastructure: allocating resources, configuring network rules, and installing software.

“Provisioning a new environment used to take two days. With IaC, it takes about 15 minutes.”

State (Terraform) A file that tracks the current state of managed infrastructure. Terraform compares state with your configuration to determine what changes to make.

“The Terraform state is stored in S3 with locking handled by DynamoDB.”

Deployment Strategies

Canary deployment A strategy in which a new version is rolled out to a small percentage of users or servers first, monitored for errors, and then gradually expanded.

“We’re doing a canary deploy — 5% of traffic hits the new version for 24 hours before full rollout.”

Blue-green deployment A strategy with two identical environments: one live (blue) and one idle (green). New code is deployed to green, tested, then traffic is switched. Rollback is instant.

“Blue-green is our strategy for zero-downtime deployments — we flip the load balancer when we’re ready.”

Rolling deployment A strategy in which new instances gradually replace old ones. At any point during the rollout, both old and new versions are serving traffic.

“The rolling deployment updates one pod at a time so we always maintain capacity.”

Feature flag A technique for enabling or disabling a feature at runtime without redeploying code. Useful for dark launching, A/B testing, and gradual rollouts.

“The new search algorithm is behind a feature flag — we’ll enable it for 10% of users first.”

Rollback Reverting a deployment to the previous version following an incident or failed deployment.

“The deploy caused a spike in 5xx errors — we rolled back to the previous version within 3 minutes.”

Hotfix An urgent fix deployed directly to production, typically bypassing the normal release cycle.

“There’s a critical auth bypass in prod — we need to cut a hotfix immediately.”

How to Use This Vocabulary in Practice

Reading the definitions is a start. Speaking them fluently is the goal. Here are three techniques:

1. Shadowing technical talks Watch conference talks on YouTube (KubeCon, HashiConf, AWS re:Invent). Pause after each sentence and repeat it aloud. Pay attention to how speakers describe pipelines, incidents, and deployments.

2. Narrating your own work When you deploy something, write or say the sentence aloud: “I’ve created a Terraform config that provisions the VPC and three subnets declaratively.” This builds the habit of using technical English in real contexts.

3. Reading incident post-mortems Sites like GitHub’s Post-Mortem Archive and Heroku’s Status History contain authentic, detailed DevOps English. Notice how incidents are described, how timelines are written, and how decisions are explained.

Quick Reference

Term	One-line definition
Pipeline	Automated steps from commit to production
Artifact	File produced by a build stage
Pod	Smallest unit in Kubernetes
SLA	Contract-level service guarantee
SLO	Internal reliability target
SLI	Metric used to measure performance
Error budget	Allowed failure margin before SLO breach
MTTR	Average time to restore after incident
IaC	Infrastructure managed via code
Idempotent	Same result no matter how many times applied
Drift	Gap between declared and actual state
Canary	Gradual rollout to small audience
Blue-green	Switch traffic between two identical environments
Feature flag	Toggle feature on/off without redeployment
Rollback	Revert to previous version

Building this vocabulary does not happen overnight. The most effective approach is consistent exposure: read the documentation for the tools you use, follow public incident reports, and practice the terms in writing and speech. Each time you use the word “idempotent” naturally in a discussion, it sticks a little more.