Top 47 Kubernetes Interview Questions and Answers

Simple Answers to Help You in Any Kubernetes Interview.

Kubernetes Interview Answer

Cluster Architecture

Can you explain the different components of the Kubernetes control plane and their roles?

The Kubernetes control plane consists of the API Server, etcd, Controller Manager, and Scheduler.

API Server: Acts as the cluster's gateway, handling all REST commands to manage resources. It's like the receptionist in a building; any interaction with the cluster goes through this point. The API Server validates and processes requests to create, update, or delete Kubernetes objects.

etcd: This is a consistent and highly available key-value store used as Kubernetes' backing store for all cluster data. It's crucial because it acts as the source of truth, maintaining the current state and desired state of the cluster.

Controller Manager: A collection of control loops that monitor the cluster state, ensuring the desired state matches the actual state. It manages tasks like pod replication, node status, and endpoint management. For example, if a node fails, the Controller Manager ensures that new pods are scheduled on other nodes.

Scheduler: The Scheduler watches for newly created pods without assigned nodes and selects a node for them to run on. It makes decisions based on resource availability, quality of service, and other user-defined constraints. Essentially, it ensures the most efficient distribution of workloads.

How does the etcd datastore work within a Kubernetes cluster, and why is it crucial?

etcd is a distributed key-value store that Kubernetes uses to store all its cluster data. Think of it as the cluster's memory—every time you create, delete, or update a resource, the change is recorded in etcd. Its importance lies in its ability to provide a consistent and highly available source of truth for the cluster. If etcd fails or becomes corrupted without backups, the cluster's state can become inconsistent, leading to potential data loss or cluster failure. This is why etcd is often secured and backed up regularly.

Networking

How does the Kubernetes networking model work, especially the concepts of Pods, Services, and Ingress?

Kubernetes networking is designed to ensure that each pod can communicate with other pods, regardless of the node they’re on.

Pods: Each pod gets its unique IP address, and within the pod, containers share the same network namespace. This allows them to communicate easily using localhost.

Services: Since pods are ephemeral, a Service provides a stable endpoint for a set of pods, abstracting their IP addresses. It’s like a phone number that stays the same even if the people (pods) change.

Ingress: It acts as an entry point to the cluster for external traffic. Ingress can route traffic based on rules, handle SSL/TLS termination, and offer more complex routing than simple Services.

Can you explain the difference between ClusterIP, NodePort, and LoadBalancer services?

These are the different ways Kubernetes exposes services:

ClusterIP: The default service type. It exposes the service within the cluster using a virtual IP, making it accessible only within the cluster. It's like an internal hotline for inter-pod communication.

NodePort: Exposes the service on each node’s IP at a specified port, making it accessible from outside the cluster. Think of it as opening a specific door on every node for outside traffic.

LoadBalancer: Works with cloud providers to create an external load balancer that routes traffic to the service. It’s like setting up a managed toll booth at the entrance of your cluster, distributing incoming traffic across nodes.

Your cluster nodes have two NICs connected to different networks. How do you bootstrap the cluster, and what issues could you face?

When bootstrapping, specify the correct network interface to ensure that node-to-node communication uses the intended network. Misconfiguring network interfaces can lead to issues like incorrect routing, IP address overlap, or pods not being reachable. For example, if Kubernetes uses the wrong interface for inter-node communication, the overlay network might break, causing pods to fail to communicate.

Please explain the journey of a packet from one pod to another.

When a packet travels from one pod to another in Kubernetes, the journey depends on the network setup:
If both pods are on the same node, the packet is routed through the node's internal networking, directly using the pod's IP addresses.
If the pods are on different nodes, the packet goes through the network overlay managed by the Container Network Interface (CNI) plugin (e.g., Flannel, Calico). The source node encapsulates the packet (often using VXLAN) and routes it to the destination node, which then decapsulates and delivers it to the target pod.
Throughout this journey, kube-proxy may intervene to manage the traffic, especially if Services are involved.

Pod Lifecycle

What are the different phases in the lifecycle of a Pod, and what happens during each phase?

Pending: The pod has been accepted by the Kubernetes system, but one or more of its containers are not yet running. This may happen because the scheduler hasn't assigned it to a node yet or because there are insufficient resources.

Running: The pod has been bound to a node, and all containers have been created. At least one container is still running, or is in the process of starting or restarting.

Succeeded: All containers in the pod have successfully terminated, and they will not be restarted.

Failed: All containers have terminated, but at least one container has exited with a non-zero status.

Unknown: The state of the pod cannot be obtained, usually because the communication with the node has failed.

How do you handle Pod scheduling, and what strategies can you use to ensure Pods are efficiently scheduled?

Pod scheduling is handled by the Scheduler component, which assigns pods to nodes based on available resources and constraints. You can influence scheduling using:

Node Selectors: Simple key-value pairs attached to nodes. Pods with matching selectors are scheduled onto those nodes.

Affinity and Anti-affinity: More expressive rules that define preferred or required node characteristics or co-location with other pods.

Taints and Tolerations: Nodes can have taints to repel certain pods. Pods can tolerate those taints, allowing them to be scheduled on those nodes.

What happens if a container doesn’t pass the ReadinessProbe?

If a container fails the ReadinessProbe, the pod is marked as "Not Ready." Kubernetes will stop sending traffic to this pod through Services, ensuring that no requests are routed to it until it passes the probe. This helps in scenarios where the container may need additional time to warm up before it can serve requests.

What is the difference between a Deployment and a StatefulSet?

Deployment: Manages stateless applications. It ensures the specified number of replicas are running and can perform rolling updates. Deployments are best for scenarios where pods are interchangeable.

StatefulSet: Manages stateful applications, ensuring each pod has a unique, stable identity and consistent storage. It's useful for applications requiring persistent storage, like databases, where the order and uniqueness of pod deployment are crucial.

What is a Headless Service?

A headless service is a Service without a ClusterIP. It doesn’t provide load balancing or a stable IP but still enables service discovery. When you create a headless service, Kubernetes returns the actual IPs of the associated pods, allowing clients to connect directly to pod endpoints. It's commonly used with StatefulSets where direct pod access is required.

How can we run Static Pods?

Static pods are directly managed by the kubelet on a node, not the API server. To run them, you place pod definition files in a specific directory on the node (e.g., /etc/kubernetes/manifests). The kubelet monitors this directory and automatically starts any pods defined there. They are typically used for essential system pods or bootstrapping a cluster.

What is a Pod Sandbox?

A pod sandbox provides an isolated environment where pods run. It includes the network namespace and other shared resources used by the containers within the pod. When Kubernetes starts a pod, it first creates a sandbox (often a lightweight container like a "pause" container), setting up the networking and storage before the main application containers start.

Storage

How does Kubernetes manage persistent storage, and what are the differences between Persistent Volumes (PVs) and Persistent Volume Claims (PVCs)?

Kubernetes abstracts storage with Persistent Volumes (PVs) and Persistent Volume Claims (PVCs).

PVs: Represent physical storage in the cluster, defined by an admin. They exist independently of any particular pod.

PVCs: Requests for storage by users. When a pod needs storage, it creates a PVC specifying size and access mode. Kubernetes binds the PVC to a matching PV, allowing the pod to use that storage. This separation allows dynamic provisioning and storage independence.

Can you explain the concept of StorageClasses and how they are used in dynamic provisioning?

StorageClasses define different types of storage (e.g., SSD, HDD) and the provisioners that create them. When a PVC requests storage with a specific StorageClass, Kubernetes automatically provisions a PV that meets the requirements. This dynamic provisioning eliminates the need to manually create PVs for every request, streamlining storage management.

What happens if you have a PodDisruptionBudget (PDB) with a maxUnavailable of 2, and you want to drain a node where a pod of the deployment is in a CrashLoop?

A PDB defines the minimum availability during voluntary disruptions. If a PDB allows a maxUnavailable of 2, you can drain a node only if at least the defined minimum of pods remains available. If the CrashLoop pod is counted in the maxUnavailable, draining could violate the PDB, blocking the drain operation. You need to resolve the CrashLoop issue or adjust the PDB to proceed.

How do you fix an issue where a Postgres pod crashes due to a configuration error on the PVC?

First, inspect the PVC and pod logs to identify the configuration error. Update the PVC configuration if possible, or create a new PVC with the correct settings. If data integrity is not compromised, restart the pod. Otherwise, restore the data from a backup.

Security

How does Kubernetes manage access control, and what are the key components of RBAC (Role-Based Access Control)?

Kubernetes uses RBAC to control who can perform what actions within the cluster. RBAC is built around four key components:

Roles and ClusterRoles: Define sets of permissions at the namespace (Role) or cluster level (ClusterRole).

RoleBindings and ClusterRoleBindings: Associate users or service accounts with specific roles. This mapping dictates what actions entities can perform on resources, like creating pods or accessing secrets.

What are Network Policies, and how do they enhance security within a Kubernetes cluster?

Network Policies are like firewalls for your pods. They define rules that specify which pods can communicate with each other and with external resources. By default, pods can communicate with any other pod. Network Policies restrict this communication, enhancing security by enforcing isolation between different parts of your application or environment.

Which RBAC permissions can lead to Privilege Escalation within the cluster and why?

Permissions like create or update on roles (ClusterRole) and role bindings can lead to privilege escalation. For example, if a user can create a new ClusterRole with admin privileges and bind it to their account, they can grant themselves full control over the cluster.

Configuration Management

How do ConfigMaps and Secrets differ, and when would you use each?

ConfigMaps store non-sensitive configuration data like environment variables or configuration files. Secrets store sensitive data like passwords or API keys in an encrypted format. ConfigMaps are used for general configurations, while Secrets should always be used for confidential data to protect against exposure.

What are the best practices for managing environment-specific configurations in a Kubernetes cluster?

Use namespaces to separate environments (e.g., dev, staging, prod). Utilize ConfigMaps and Secrets to store environment-specific configurations, ensuring they are version-controlled and secure. Tools like Helm allow you to template configurations, making it easier to manage variations across environments.

How do you cancel deletion for a resource (e.g., Ingress) which has a finalizer attached to it?

A finalizer is a mechanism that prevents a resource from being deleted until certain cleanup tasks are completed. If you need to cancel the deletion, remove the finalizer by editing the resource and deleting the finalizer entry from the metadata. This unblocks the deletion process.

Scaling and Performance

How do you implement horizontal and vertical scaling in Kubernetes?

Horizontal scaling: Involves adding more pod replicas to handle increased load. You can do this manually using kubectl scale or automatically using the Horizontal Pod Autoscaler (HPA), which scales pods based on metrics like CPU usage.

Vertical scaling: Adjusts the resource limits (CPU, memory) for existing pods. If a pod requires more resources, you update the resource requests/limits. However, vertical scaling requires pod recreation to apply changes.

What tools and metrics do you use to monitor and optimize the performance of a Kubernetes cluster?

Use tools like Prometheus and Grafana to collect and visualize metrics. Monitor CPU, memory usage, network traffic, and pod restarts to identify performance bottlenecks. Additionally, tools like the Kubernetes Metrics Server provide real-time resource metrics, which can be used for autoscaling.

CI/CD Integration

How would you integrate Kubernetes with a CI/CD pipeline?

Integrate Kubernetes with CI/CD tools like Jenkins, GitLab CI, or Argo CD. The pipeline can automate the process of building container images, pushing them to a registry, and deploying them to the cluster. For example, after a successful build, the pipeline triggers a kubectl apply or Helm chart update to deploy the latest changes.

What are the benefits and challenges of using tools like Helm and Kustomize in a CI/CD process?

Benefits: Helm and Kustomize simplify deployment by providing templating and version control for Kubernetes manifests. Helm offers easy rollbacks and release management, while Kustomize enables overlay configurations without duplicating YAML files.

Challenges: Helm charts can become complex, and maintaining consistency across environments can be challenging. Kustomize can require managing multiple overlays, which can become cumbersome.

Consider you have a Kubernetes cluster that is integrated with Argo as its CD pipeline. How do you break out of an infinite loop where Argo keeps triggering a Kubernetes Job?

Adjust the Argo workflow configuration to include conditional logic or use synchronization mechanisms like annotations or labels to control job triggering. For instance, you can use a semaphore pattern to prevent re-triggering the job.

Advanced Topics

Can you explain the concept of Operators and how they extend Kubernetes functionality?

Operators are Kubernetes controllers that encode domain-specific knowledge for managing complex applications. They extend Kubernetes functionality by automating tasks like deployment, scaling, backup, and failover for stateful applications. An Operator watches custom resources (CRDs) and takes actions to maintain the desired state.

What are Custom Resource Definitions (CRDs), and how do they allow for the creation of custom resources within Kubernetes?

CRDs extend the Kubernetes API to allow users to define their own custom resources. Once a CRD is created, you can manage custom objects just like built-in Kubernetes resources. For example, you can define a Database CRD and create, update, or delete Database objects using kubectl.

When should you use (or customize) an operator?

Use an Operator when managing complex stateful applications that require domain-specific logic, such as databases. Customizing an Operator is necessary when the out-of-the-box functionalities don't meet your application's needs, allowing you to encode custom behaviors.

For Helm templates, is there any standard practice on how to group your app, e.g., by backend/frontend?

Yes, a standard practice is to use Helm subcharts or different values files to group components like backend and frontend. This allows you to maintain a modular structure, where each component can be managed and deployed independently while ensuring the overall application is cohesive.

Troubleshooting

How do you debug a failing Pod in a Kubernetes cluster?

Start with kubectl describe pod <pod-name> to inspect events and statuses. Check container logs using kubectl logs <pod-name> to identify errors. If further investigation is needed, use kubectl exec -it <pod-name> -- /bin/sh to get a shell into the running container for a closer look.

What steps would you take if you notice a node is not joining the cluster?

Check the node's kubelet logs for errors (journalctl -u kubelet). Ensure that the correct token is used for joining the node to the cluster. Verify network connectivity between the node and the control plane, and confirm that the node's firewall or security group settings allow communication on required ports.

How do you troubleshoot a node that suddenly stops resolving DNS queries, and the service management tool in use is systemd?

Start by checking the CoreDNS logs (kubectl logs -n kube-system <coredns-pod>) for errors. Verify the node's resolv.conf and ensure it's configured to use CoreDNS. If systemd-resolved is in use, confirm it's correctly forwarding DNS queries. Also, check if any recent changes to network policies or firewalls might be blocking DNS traffic.

How do you bootstrap the cluster, what issues could you run into, and how do you solve them?

Use kubeadm to bootstrap a Kubernetes cluster. Common issues include network configuration errors, certificate generation failures, or time synchronization problems. Solve them by inspecting the logs in /var/log/kubelet.log and ensuring the network plugin (CNI) is correctly installed.

Tell me everything that happens from the point you execute kubectl create -f pod.yaml until the pod is running.

When you execute kubectl create -f pod.yaml, the request is sent to the API Server, which validates and stores the pod specification in etcd. The Scheduler detects the new pod and assigns it to a suitable node. The kubelet on that node pulls the required container images and creates the pod sandbox. It sets up networking and storage, then starts the containers. The kubelet continuously monitors the pod's status and reports back to the API Server.

Where can you look to see if a required mutating webhook is failing?

Check the kube-apiserver logs for errors related to the webhook. You can also inspect the MutatingWebhookConfiguration resource using kubectl get mutatingwebhookconfiguration to ensure it's correctly configured and reachable.

How do you handle a scenario where a Postgres pod crashes due to a misconfiguration on the PVC?

Inspect the PVC and pod logs to identify the configuration error. Correct the PVC configuration if possible. If the PVC is corrupted, you may need to create a new PVC and restore data from a backup. Restart the pod once the issue is resolved.

Miscellaneous

What is the difference between a readiness probe, liveness probe, and startup probe, and when would you use each?

Readiness Probe: Checks if the pod is ready to serve traffic. Use it when the application takes time to initialize.

Liveness Probe: Checks if the pod is healthy and should be restarted if it fails. Use it to detect application crashes or deadlocks.

Startup Probe: Checks if the application has started correctly. It’s used for slow-starting containers to avoid failing liveness probes during startup.

What are endpoints, and how are they related to services?

Endpoints are objects that track the IP addresses of the pods backing a service. They define which pods the service can send traffic to. When a service routes a request, it uses its associated endpoints to reach the correct pods.

Define Kubernetes using an analogy.

Kubernetes is like the conductor of an orchestra, coordinating various instruments (containers) to create harmonious music (applications). It ensures each instrument plays at the right time and can handle changes, like adding new musicians (scaling) or replacing ones who stop playing (self-healing).

What problem does Kubernetes solve?

Kubernetes solves the complexity of deploying, scaling, and managing containerized applications. It provides automation, self-healing, load balancing, and resource management, allowing developers to focus on building applications rather than managing infrastructure.

Choose a metrics and logs plugin, and explain how you’re going to export container logs and metrics out of Kubernetes.

Use Prometheus for metrics and Fluentd for logs. Prometheus scrapes metrics from nodes and pods, storing them for monitoring and alerting. Fluentd collects logs from the nodes and containers, forwarding them to an external storage system like Elasticsearch or a cloud logging service for analysis and troubleshooting.

How does kube-proxy load balance services?

kube-proxy manages network rules (iptables or IPVS) on each node to direct traffic to backend pods. It can use round-robin or other algorithms to distribute requests across the available pods, ensuring balanced load and high availability.

What is a pause container?

The pause container is the foundation of the pod's network stack. It serves as the parent container for the pod, holding the network namespace. This allows the containers within the pod to share the same network and storage resources.

How are DaemonSet Pods scheduled on Nodes?

A DaemonSet ensures that a copy of a pod runs on every (or selected) node in the cluster. When a new node is added to the cluster, the DaemonSet automatically schedules a pod on that node without needing any manual intervention. This is useful for deploying node-level services like log collection agents.

Must-Know Kubernetes Questions for Job Interviews