DevOps glossary

Plain-English definitions. No marketing language. 23 terms across CI/CD, Kubernetes, observability, and security.

CI/CDDevOpsGitOpsInfrastructure as CodeKubernetesObservabilitySecurity

ArgoCD

GitOps

A Kubernetes-native continuous delivery tool that watches a Git repository and reconciles the cluster state with what is declared in that repository. When you push a change to your Helm chart or Kubernetes manifests, ArgoCD detects the drift and applies the change. Every deployment is a Git commit — auditable, reversible, and reviewable.

Blue-green deployment

CI/CD

A deployment strategy where you run two identical environments — blue (current production) and green (new version). Traffic is switched from blue to green all at once once the new version is verified. Rollback is instant: switch traffic back to blue. Requires double the infrastructure during the transition window.

Canary deployment

CI/CD

A deployment strategy where the new version is rolled out to a small percentage of traffic first — typically 1–5%. If error rates and latency stay within acceptable bounds, the rollout continues to 100%. Catches issues that only appear at real usage patterns before they affect all users.

DORA metrics

DevOps

Four metrics from the DevOps Research and Assessment team that measure software delivery performance: Deployment Frequency (how often you deploy), Lead Time for Changes (time from commit to production), Change Failure Rate (percentage of deploys that cause incidents), and Time to Restore (how long to recover from an incident). Elite performers deploy multiple times per day with less than 1% change failure rate.

Flux

GitOps

A GitOps operator for Kubernetes, similar to ArgoCD. Flux watches Git repositories and applies changes to the cluster automatically. Flux tends to be more modular and CLI-heavy; ArgoCD has a richer UI. Both are production-ready; the choice usually comes down to team preference.

GitOps

DevOps

An operational model where the desired state of your infrastructure and applications is stored in Git, and automated tooling continuously reconciles the actual state with the declared state. If someone makes a change directly in the cluster (outside Git), the GitOps operator detects the drift and reverts it. Every change is a pull request.

Helm

Kubernetes

A package manager for Kubernetes. Helm charts are templated YAML files that define a set of Kubernetes resources. Instead of maintaining 20 YAML files per service, you maintain one chart with configurable values. Charts can be published to registries and shared across teams or organisations.

Horizontal Pod Autoscaler (HPA)

Kubernetes

A Kubernetes controller that scales the number of Pod replicas based on observed metrics — typically CPU or memory utilisation. When CPU exceeds a threshold, HPA adds replicas. When load drops, it scales down. Requires resource requests to be set on Pods to calculate utilisation correctly.

Infrastructure as Code (IaC)

DevOps

Managing infrastructure through machine-readable configuration files rather than manual UI clicks or scripts. Tools like Terraform, Pulumi, and AWS CDK let you define your cloud resources in code, version them in Git, review changes via pull requests, and apply them consistently across environments.

Istio

Kubernetes

A service mesh for Kubernetes that adds traffic management, mutual TLS between services, observability, and fine-grained access control without changing application code. It runs as a sidecar proxy (Envoy) alongside each Pod. The operational overhead is significant — only add Istio when you actually need what it provides.

Karpenter

Kubernetes

A node autoscaler for Kubernetes (primarily EKS) that provisions EC2 instances in response to pending Pods. Unlike Cluster Autoscaler, Karpenter provisions the exact node type that best fits the pending workload — choosing Spot vs On-Demand, instance family, and size dynamically. Typically reduces node costs by 50–70% compared to fixed-size node groups.

KEDA

Kubernetes

Kubernetes Event-Driven Autoscaler. Extends the HPA to scale Pods based on external event sources — queue depth in SQS, message count in Kafka, metrics from Prometheus, or dozens of other sources. Useful for batch jobs and event-driven architectures where CPU is not a meaningful scaling signal.

Kustomize

Kubernetes

A Kubernetes configuration management tool built into kubectl. Unlike Helm, Kustomize does not use templates — it uses overlays and patches on top of base YAML files. Useful for maintaining per-environment differences (staging vs production) without duplicating the entire manifest set.

Lead time for changes

DevOps

One of the four DORA metrics. Measures the time from a code commit being merged to that commit running in production. Elite performers achieve under one hour. Most startups without CI/CD automation are in the one-day to one-week range. Reducing lead time reduces the blast radius of bugs and accelerates feedback loops.

mTLS (Mutual TLS)

Security

An authentication mode where both sides of a connection verify each other's identity using TLS certificates. In Kubernetes, a service mesh like Istio can enforce mTLS between all services, ensuring that even internal cluster traffic is authenticated and encrypted — a requirement for some compliance frameworks.

OPA (Open Policy Agent)

Security

A policy engine that lets you define and enforce rules across your infrastructure. In Kubernetes, OPA (often via Gatekeeper) can enforce policies like 'no Pods without resource limits', 'all images must come from our private registry', or 'no privileged containers in production'. Policies are written in Rego, OPA's query language.

OpenTelemetry

Observability

A vendor-neutral observability framework for collecting traces, metrics, and logs. Instead of instrumenting your application for a specific backend (Datadog, Jaeger, etc.), you instrument once with OpenTelemetry and route data to any compatible backend. Avoids vendor lock-in on observability tooling.

Prometheus

Observability

An open-source metrics collection and alerting system. Services expose metrics at a /metrics endpoint in a standard format; Prometheus scrapes these endpoints on a configurable interval and stores the time-series data. Queries use PromQL. Usually deployed alongside Grafana for visualisation and AlertManager for alerting.

Rolling deployment

CI/CD

The default Kubernetes deployment strategy. Pods running the old version are terminated and replaced with Pods running the new version gradually, a few at a time. Traffic is served from both versions during the rollout. Zero downtime if the application handles concurrent versions correctly. The rollout can be paused or reversed at any point.

Service mesh

Kubernetes

A dedicated infrastructure layer for handling service-to-service communication. A service mesh (Istio, Linkerd, Cilium) handles routing, load balancing, mTLS, retries, circuit breaking, and telemetry at the network level — without changing application code. Adds operational complexity; evaluate carefully before adopting.

Terraform

Infrastructure as Code

The most widely used infrastructure-as-code tool. You describe your cloud resources in HCL (HashiCorp Configuration Language), run terraform plan to preview changes, and terraform apply to provision them. State is stored in a backend (S3, Terraform Cloud) and tracks what Terraform has created. Provider support covers AWS, GCP, Azure, and hundreds of other services.

Vertical Pod Autoscaler (VPA)

Kubernetes

A Kubernetes controller that adjusts CPU and memory requests and limits on Pods based on observed usage. Unlike HPA which changes the number of replicas, VPA changes the size of individual Pods. Useful in 'recommendation' mode to identify correct resource settings without automatically changing running Pods.

Zero-downtime deployment

CI/CD

A deployment that does not interrupt service to users. Achieved through rolling deployments (gradual replacement of old pods), blue-green switching (atomic traffic cut-over), or load balancer health check draining (traffic stops before the old version is terminated). Requires the application to handle requests that start on the old version and complete after the new version is live.