Troubleshoot Alarms
For information on accessing your Kubernetes cluster running in Massdriver, please check out this guide.
Troubleshooting Kubernetes
Below are the alarms that you may encounter while working with Kubernetes in Massdriver, and how to diagnose the root cause of the issue:
Pods not ready
Alarm description
Pods not ready
alarms when one of the pods in your cluster is not ready.
Diagnosis
To diagnose the issue, you can run the following command:
kubectl get pods -A # Lists all pods in all namespaces
kubectl describe pod <pod-name> -n <namespace> # Describes the pod in detail
In the describe output, look at the following sections for clues:
- Status: Check if the pod is in the
Running
,Pending
,CrashLoopBackOff
, or other state. - Conditions: Look for any conditions that are not
True
, such asReady
,Initialized
,ContainersReady
, etc. - Containers: Check if the container is in the
Running
,Terminated
, orWaiting
state. Look at theRestart Count
andLast State
for more information. - Events: Look at the events section for any error messages or warnings.
Pods crash looping
Alarm description
Pods crash looping
alarms when a pod enters a CrashLoopBackOff
state. This state indicates that the pod is crashing and restarting repeatedly.
Diagnosis
To diagnose the issue, you can run the following command:
kubectl get pods -A # Lists all pods in all namespaces
kubectl logs <pod-name> -n <namespace> # Displays the logs of the pod
In the logs, look for any error messages or warnings that might indicate why the pod is crashing.
Deployment rollout unsuccessful
Alarm description
Deployment rollout unsuccessful
alarms when a deployment has not been successfully rolled out.
Diagnosis
To diagnose the issue, you can run the following command:
kubectl get deployments -A # Lists all deployments in all namespaces
kubectl describe deployment <deployment-name> # Describes the deployment in detail
In the describe output, look at the following sections for clues:
- Replicas: Check here to ensure that the numbers for
desired
,updated
,total
, andavailable
match. If any of them do not match, it indicates that the rollout is incomplete or there were issues with the deployment. If you see anyunavailable
, it means that the desired replicas are not running or accessible, possibly due to pod failures or scheduling problems. - Conditions: Look here to verify that
Progressing
isTrue
and set toNewReplicaSetAvailable
, and thatAvailable
isTrue
and set toMinimumReplicasAvailable
. If these conditions are not met, it suggests the deployment is stuck or failing to roll out properly. - StrategyType: Verify the deployment strategy to ensure proper configuration. Misconfigurations here could cause delays or failures in the rollout process.
- NewReplicaSet: Check here to ensure it's creating the proper number of replicas.
- Events: Review this section for any error messages or warnings.
DaemonSet rollout unsuccessful
Alarm description
DaemonSet rollout unsuccessful
alarms when a daemonset has not been successfully rolled out.
Diagnosis
To diagnose the issue, you can run the following command:
kubectl get daemonset -A # Lists all daemonsets in all namespaces
kubectl describe daemonset <daemonset-name> # Describes the daemonset in detail
In the describe output, look at the following sections for clues:
- Desired Number of Nodes and Current Number of Nodes: Verify these fields match. If they don't match, the daemonset is not scheduled on all nodes.
- Number of Nodes Scheduled with Up-to-date pods and Desired Number of Nodes Scheduled: Verify these fields match. If they don't match, it means some nodes are running outdated pods.
- Pod Status: Check here for pod states:
# Running / # Waiting / # Succeeded / # Failed
. This indicates potential issues with rollout. - Events: Review this section for any error messages or warnings.
- Node-Selectors and Tolerations: Review this section for node placement configurations. Misconfigurations here could prevent the daemonset from being scheduled on certain nodes.
Autoscaler unscheduled pods
Alarm description
Autoscaler unscheduled pods
alarms when the autoscaler is unable to schedule pods (cannot be placed) and the autoscaler is unable to resolve the issue.
Diagnosis
To diagnose the issue, you can run the following command:
kubectl get pods -A # Lists all pods in all namespaces
kubectl describe pod <pod-name> -n <namespace> # Describes the pod in detail
In the describe output, look at the following sections for clues:
- Status: Ensure the pod is in a
Running
state. If it's in aPending
state, it means the pod is not scheduled. - Conditions: Look for any conditions that are not
True
, such asReady
,Initialized
,ContainersReady
, etc. - Node-Selectors and Tolerations: Check if the pod has node selectors or tolerations that prevent it from being scheduled on any node.
- Requests: Check if the pod requests are too high for the available resources in the cluster.
- Events: Look at the events section for any error messages or warnings.