Free Resources · No Sign-up Required

DevOps Checklists for Engineering Teams

Practical checklists covering CI/CD, Kubernetes, cloud security, and observability. Use them in your next sprint, share them with your team, or run them before your next audit.

CI/CD Pipeline Readiness Checklist

Before you deploy to production, verify your pipeline covers these fundamentals.

All code changes go through a pull request - no direct pushes to main
At least one required reviewer approves before merge
Automated tests run on every pull request and block merge on failure
Build produces a versioned, immutable artifact (Docker image, binary)
Secrets are injected at runtime - not hardcoded or baked into images
Staging environment mirrors production (same infra, similar data volumes)
Deployment to production requires manual approval or a gate
Rollback procedure is documented and tested at least quarterly
Each deploy is logged with who triggered it, what version, and when
Failed deploys alert the team within 2 minutes

10 checkpoints

Get a free audit of your setup

Kubernetes Production Readiness Checklist

Running Kubernetes in production without these in place is how incidents happen at 2am.

Resource requests and limits set on every container
Liveness and readiness probes configured correctly
Horizontal Pod Autoscaler (HPA) configured for stateless services
Pod Disruption Budgets (PDB) set for critical services
No containers running as root
Network policies restrict pod-to-pod traffic by default
Secrets stored in a secrets manager - not plain Kubernetes Secrets
RBAC is configured - no wildcard permissions for service accounts
Image vulnerability scanning in the CI pipeline (Trivy, Snyk)
Node auto-scaling configured and tested
Cluster version is within N-1 of the latest stable release
etcd is backed up daily and restore has been tested

12 checkpoints

Get a free audit of your setup

Cloud Security Baseline Checklist

These are the controls that come up in every SOC2, HIPAA, and ISO 27001 audit. Fix them before the auditor asks.

Root / admin accounts have MFA enabled
IAM users/roles follow least-privilege - no wildcard policies in production
No long-lived API keys or access keys in code or CI environment variables
All secrets rotated through a secrets manager (AWS Secrets Manager, Vault)
CloudTrail / audit logging enabled across all accounts and regions
S3 buckets are not public unless explicitly intended to be
Encryption at rest enabled for all databases and storage volumes
TLS enforced on all external endpoints - no HTTP in production
Security group rules restrict inbound access - no 0.0.0.0/0 on SSH/RDP
Dependency vulnerability scanning runs in CI (npm audit, pip-audit, etc.)
Incident response plan exists and has been rehearsed
Access is reviewed and revoked for offboarded employees within 24 hours

12 checkpoints

Get a free audit of your setup

Observability Readiness Checklist

If you cannot answer 'what broke and why' within 5 minutes of an incident, your observability is not production-ready.

All services emit structured logs (JSON, not plaintext)
Logs are centralised - not trapped on individual instances
Application performance metrics exported (latency, error rate, throughput)
Infrastructure metrics collected (CPU, memory, disk, network per service)
Alerts fire on symptom (high error rate, high latency) not just cause (high CPU)
Every alert has a documented runbook
On-call rotation is documented with a clear escalation path
Dashboards show the last 30 days of baseline - not just live data
Distributed tracing enabled for requests that span multiple services
Synthetic uptime monitoring checks all external-facing endpoints

10 checkpoints

Get a free audit of your setup

Found gaps in your checklist?

Book a free 30-minute audit and we will walk through exactly what needs fixing and in what order.

Book Free Pipeline Audit