Posted on 6 January 2020, updated on 21 September 2023.
Debugging a Kubernetes environment can get tedious if you do not know the right commands to quickly deploy the tools you need at the right time. The purpose of this article is to provide some commands, tools, and practices that can help you debug faster in a Kubernetes context.
Looking at Kubernetes error messages
You can start debugging by looking for error messages. In Kubernetes, they will often take the form of a resource called "event" that you can list with the command `kubectl get events
`. It provides a centralized view of events in kubernetes like pods entering a “Crashloop Backoff” state or “OOM killed”.
$ kubectl get events -n kube-system | |
LAST SEEN TYPE REASON OBJECT MESSAGE | |
31s Warning FailedGetResourceMetric horizontalpodautoscaler/nginx-ingress-controller-controller missing request for memory |
Looking at Kubernetes pods logs
Then you can check Kubernetes pods logs running `kubectl logs <pod-name> -f
` but it forces you to put the exact pod name and you can not aggregate logs from several pods.
Stern allows you to aggregate logs of all pods and uses a regular expression to filter them `stern <expression>
`
$ stern prox -n kube-system | |
+ kube-proxy-28dqn › kube-proxy | |
+ kube-proxy-hrs9j › kube-proxy | |
+ kube-proxy-dd9nj › kube-proxy | |
kube-proxy-hrs9j kube-proxy I0101 11:46:23.761175 1 service.go:357] Removing service port "kubia/kubia:http" | |
kube-proxy-hrs9j kube-proxy I0101 11:46:23.761207 1 service.go:357] Removing service port "kubia/kubia:https" | |
kube-proxy-28dqn kube-proxy I0101 11:46:23.761308 1 service.go:357] Removing service port "kubia/kubia:http" | |
kube-proxy-28dqn kube-proxy I0101 11:46:23.761341 1 service.go:357] Removing service port "kubia/kubia:https" | |
kube-proxy-dd9nj kube-proxy I0101 11:46:23.761299 1 service.go:357] Removing service port "kubia/kubia:http" | |
kube-proxy-dd9nj kube-proxy I0101 11:46:23.761330 1 service.go:357] Removing service port "kubia/kubia:https" |
Debug flux opening
You might want to quickly check if a route is opened. To do so, run and ssh in a busybox minimal bare operating system in a single command `kubectl run --generator=run-pod/v1 -i --tty busybox --image=busybox --restart=Never -- sh
`. It contains several useful tools for debugging.
$ kubectl run --generator=run-pod/v1 -i --tty busybox --image=busybox --restart=Never -- sh | |
If you don't see a command prompt, try pressing enter. | |
/ # telnet 8.8.8.8:80 | |
^C | |
/ # telnet 8.8.8.8:443 | |
Connected to 8.8.8.8:443 | |
^C |
Debug services connections
For an http call the same command as above with an image provided with curl will do `kubectl run --generator=run-pod/v1 -i --tty busybox --image=radial/busyboxplus:curl --restart=Never -- sh
`
Debug databases connections
You might want to quickly test the connection to a database. Each database has its client and each client works differently but you can always run them quickly. For instance, get a mysql client in your kubernetes cluster with the following command
`kubectl run mysql -it --rm --image=mysql -- mysql -h <ip> -P <port> -u <user> -p<password>
`
Similar commands exist for postgres, sqlserver, oracle ...
Low level debug with kube-debug
Finally, for debugging lower level Linux stuff in a kubernetes pod, you can use kubectl-debug. It will inject a container that will share pid, network, user and ipc with other containers of the pod. Also, adding the `--fork
` option allows you to fork pods that are in crashloop backoff so that you can still debug them. It is pretty neat!
$ kubectl debug --agentless --port-forward api-55b6f8fdbb-s7k44 | |
Agent Pod info: [Name:debug-agent-pod-bb775c60-2e05-11ea-9512-600308a02696, Namespace:default, Image:aylei/debug-agent:latest, HostPort:10027, ContainerPort:10027] | |
Waiting for pod debug-agent-pod-bb775c60-2e05-11ea-9512-600308a02696 to run... | |
[...] | |
starting debug container... | |
container created, open tty... | |
bash-5.0# top | |
Mem: 5530216K used, 2636468K free, 2488K shrd, 9940K buff, 3749664K cached | |
CPU: 0% usr 0% sys 0% nic 95% idle 0% io 0% irq 0% sirq | |
Load average: 0.57 0.30 0.18 3/1016 39 | |
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND | |
1 0 root S 250m 3% 1 0% npm | |
23 1 root S 225m 3% 1 0% node dist/main | |
34 0 root S 2404 0% 1 0% bash | |
39 34 root R 1568 0% 1 0% top | |
bash-5.0# ps | |
PID USER TIME COMMAND | |
1 root 0:00 npm | |
23 root 0:11 node dist/main | |
34 root 0:00 bash | |
40 root 0:00 ps | |
bash-5.0# netstat | |
Active Internet connections (w/o servers) | |
Proto Recv-Q Send-Q Local Address Foreign Address State | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:38798 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:39290 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:38768 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:39018 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:38684 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:38884 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:38654 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:39172 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:38920 TIME_WAIT | |
tcp 0 0 api-55b6f8fdbb-s7k44:3000 ip-10-32-0-1.eu-west-1.compute.internal:39320 TIME_WAIT | |
Active UNIX domain sockets (w/o servers) | |
Proto RefCnt Flags Type State I-Node Path |
Here are the tools and commands we use to debug in a Kubernetes environment, tell us yours in the comment section! Also check our other articles on Kubernetes.