Skip to content
Free Resources · No Sign-up Required

DevOps Checklists for Engineering Teams

Practical checklists covering CI/CD, Kubernetes, cloud security, and observability. Use them in your next sprint, share them with your team, or run them before your next audit.

CI/CD Pipeline Readiness Checklist

Before you deploy to production, verify your pipeline covers these fundamentals.

  • All code changes go through a pull request - no direct pushes to main
  • At least one required reviewer approves before merge
  • Automated tests run on every pull request and block merge on failure
  • Build produces a versioned, immutable artifact (Docker image, binary)
  • Secrets are injected at runtime - not hardcoded or baked into images
  • Staging environment mirrors production (same infra, similar data volumes)
  • Deployment to production requires manual approval or a gate
  • Rollback procedure is documented and tested at least quarterly
  • Each deploy is logged with who triggered it, what version, and when
  • Failed deploys alert the team within 2 minutes

Kubernetes Production Readiness Checklist

Running Kubernetes in production without these in place is how incidents happen at 2am.

  • Resource requests and limits set on every container
  • Liveness and readiness probes configured correctly
  • Horizontal Pod Autoscaler (HPA) configured for stateless services
  • Pod Disruption Budgets (PDB) set for critical services
  • No containers running as root
  • Network policies restrict pod-to-pod traffic by default
  • Secrets stored in a secrets manager - not plain Kubernetes Secrets
  • RBAC is configured - no wildcard permissions for service accounts
  • Image vulnerability scanning in the CI pipeline (Trivy, Snyk)
  • Node auto-scaling configured and tested
  • Cluster version is within N-1 of the latest stable release
  • etcd is backed up daily and restore has been tested

Cloud Security Baseline Checklist

These are the controls that come up in every SOC2, HIPAA, and ISO 27001 audit. Fix them before the auditor asks.

  • Root / admin accounts have MFA enabled
  • IAM users/roles follow least-privilege - no wildcard policies in production
  • No long-lived API keys or access keys in code or CI environment variables
  • All secrets rotated through a secrets manager (AWS Secrets Manager, Vault)
  • CloudTrail / audit logging enabled across all accounts and regions
  • S3 buckets are not public unless explicitly intended to be
  • Encryption at rest enabled for all databases and storage volumes
  • TLS enforced on all external endpoints - no HTTP in production
  • Security group rules restrict inbound access - no 0.0.0.0/0 on SSH/RDP
  • Dependency vulnerability scanning runs in CI (npm audit, pip-audit, etc.)
  • Incident response plan exists and has been rehearsed
  • Access is reviewed and revoked for offboarded employees within 24 hours

Observability Readiness Checklist

If you cannot answer 'what broke and why' within 5 minutes of an incident, your observability is not production-ready.

  • All services emit structured logs (JSON, not plaintext)
  • Logs are centralised - not trapped on individual instances
  • Application performance metrics exported (latency, error rate, throughput)
  • Infrastructure metrics collected (CPU, memory, disk, network per service)
  • Alerts fire on symptom (high error rate, high latency) not just cause (high CPU)
  • Every alert has a documented runbook
  • On-call rotation is documented with a clear escalation path
  • Dashboards show the last 30 days of baseline - not just live data
  • Distributed tracing enabled for requests that span multiple services
  • Synthetic uptime monitoring checks all external-facing endpoints

Found gaps in your checklist?

Book a free 30-minute audit and we will walk through exactly what needs fixing and in what order.

Book Free Pipeline Audit