To play around with Kubernetes, rather than building one from scratch and having to fix a billion different things that could go wrong, I decided to start off with a completely working Kubernetes cluster. So, from https://medium.com/@raj10x/multi-node-kubernetes-cluster-with-vagrant-virtualbox-and-kubeadm-9d3eaac28b98 , I used the Vagrantfile gist at the end and ran vagrant up
and found myself, after thousands of unintelligible lines of code had scrolled past, back at the command line. Had things worked?
Did I need to be worried about the reams of gibberish output like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
k8s-head: service/calico-typha created k8s-head: Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition k8s-head: customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created k8s-head: customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created k8s-head: customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created k8s-head: customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created k8s-head: customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created k8s-head: customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created k8s-head: customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created k8s-head: customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created k8s-head: customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created k8s-head: serviceaccount/calico-node created k8s-head: unable to recognize "https://raw.githubusercontent.com/ecomm-integration-ballerina/kubernetes-cluster/master/calico/calico.yaml": no matches for kind "Deployment" in version "apps/v1beta1" k8s-head: unable to recognize "https://raw.githubusercontent.com/ecomm-integration-ballerina/kubernetes-cluster/master/calico/calico.yaml": no matches for kind "DaemonSet" in version "extensions/v1beta1" k8s-head: W0921 14:01:39.054627 6427 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] k8s-node-2: Processing triggers for systemd (229-4ubuntu21.29) ... k8s-node-2: sed: can't read /etc/default/kubelet: No such file or directory k8s-node-2: dpkg-preconfigure: unable to re-open stdin: No such file or directory k8s-node-2: Warning: Permanently added '192.168.205.10' (ECDSA) to the list of known hosts. k8s-node-2: [preflight] Running pre-flight checks k8s-node-2: [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ |
Who knows?! I had no idea what the state of the cluster was. Had anything worked? Had something worked? Had nothing worked?
So, some debugging:
- State of VMs
1 2 3 4 5 6 |
vagrant status Current machine states: k8s-head running (virtualbox) k8s-node-1 running (virtualbox) k8s-node-2 running (virtualbox) |
So, at least the VMs are running. That would have been a good useful thing to output!
2. State of nodes
(^^^^^ I try not to go flying off on a tangent with technology – it’s so easy to end up going down a rabbit hole of realising this is wrong and then that’s wrong however one thing that seems consistently broken across many editors is this 1. 2. numbering indent. The first indent is almost impossible to delete and the second indent is almost impossible to insert)
1 2 3 4 5 6 7 8 9 10 11 |
<strong>vagrant ssh k8s-head </strong> <strong>kubectl get services</strong> NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h <strong>kubectl get nodes</strong> NAME STATUS ROLES AGE VERSION k8s-head NotReady master 23h v1.19.2 k8s-node-1 NotReady <none> 23h v1.19.2 k8s-node-2 NotReady <none> 23h v1.19.2 |
So, the nodes don’t seem to be ready.
1 2 3 4 5 6 7 |
vagrant@k8s-head:~$ alias k=kubectl vagrant@k8s-head:~$ k get po -o wide No resources found in default namespace. vagrant@k8s-head:~$ k get po -o wide --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-f9fd979d6-4gmwp 0/1 Pending 0 23h <none> <none> <none> <none> kube-system coredns-f9fd979d6-pgdtv 0/1 Pending 0 23h <none> <none> <none> <none> |
So, something seems wrong with coredns
.
Having all these namespaces just adds to the confusion / complexity. E.g.
1 2 3 4 5 |
vagrant@k8s-head:~$ k logs coredns-f9fd979d6-4gmwp Error from server (NotFound): pods "coredns-f9fd979d6-4gmwp" not found vagrant@k8s-head:~$ k logs coredns-f9fd979d6-4gmwp --all-namespaces Error: unknown flag: --all-namespaces See 'kubectl logs --help' for usage. |
Getting the logs of something shouldn’t be this difficult. You shouldn’t have to pick up a book on Kubernetes or do the CKA or search StackOverflow to find out simple stuff like this.
OK, this page sounds promising: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/
This page shows how to debug Pods and ReplicationControllers.
Let’s try their first example:
1 2 3 4 |
vagrant@k8s-head:~$ kubectl describe pods coredns Error from server (NotFound): pods "coredns" not found vagrant@k8s-head:~$ kubectl describe pods coredns-f9fd979d6-4gmwp Error from server (NotFound): pods "coredns-f9fd979d6-4gmwp" not found |
It then says:
Look at the output of the kubectl describe … command above. There should be messages from the scheduler about why it can not schedule your pod
There aren’t any messages from the scheduler so another dead end.
So, going to Slack Kubernetes Office Hours someone suggests:
kubectl -n kube-system logs coredns-f9fd979d6-4gmwp
which usefully outputs nothing at all. Literally nothing. That’s a big fat nothing useful at all. Not even a This Pod is Pending
!
It finally turned out the magic invocation was:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
kubectl -n kube-system describe pod coredns-f9fd979d6-4gmwp Name: coredns-f9fd979d6-4gmwp Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: <none> Labels: k8s-app=kube-dns pod-template-hash=f9fd979d6 Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/coredns-f9fd979d6 Containers: coredns: Image: k8s.gcr.io/coredns:1.7.0 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-gv9wj (ro) Conditions: Type Status PodScheduled False Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-gv9wj: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-gv9wj Optional: false QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 76m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. |
and after a whole lot more gibberish it finally gets to:
1 |
Warning FailedScheduling 76m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. |
and
1 2 3 4 5 |
kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-head NotReady master 24h v1.19.2 k8s-node-1 NotReady <none> 24h v1.19.2 k8s-node-2 NotReady <none> 24h v1.19.2 |
So, coredns
did not deploy because the nodes were tainted with not-ready
rather than the Nodes being in a NotReady status due to coredns being pranged. i.e. the problem is with the Nodes.
Checking head:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
kubectl describe node k8s-head Name: k8s-head Roles: master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=k8s-head kubernetes.io/os=linux node-role.kubernetes.io/master= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Mon, 21 Sep 2020 14:01:34 +0000 Taints: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoSchedule Unschedulable: false Lease: HolderIdentity: k8s-head AcquireTime: <unset> RenewTime: Tue, 22 Sep 2020 14:24:35 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Addresses: InternalIP: 10.0.2.15 Hostname: k8s-head Capacity: cpu: 2 ephemeral-storage: 10098468Ki hugepages-2Mi: 0 memory: 2047912Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 9306748094 hugepages-2Mi: 0 memory: 1945512Ki pods: 110 System Info: Machine ID: e7bd138c751d41f4a33dd882c048ced4 System UUID: 33E00DBD-58B6-E844-97AF-749AD4E96AB6 Boot ID: d2d28c19-c5ef-443a-9c1d-20ebeaeb1082 Kernel Version: 4.4.0-189-generic OS Image: Ubuntu 16.04.7 LTS Operating System: linux Architecture: amd64 Container Runtime Version: docker://17.3.3 Kubelet Version: v1.19.2 Kube-Proxy Version: v1.19.2 PodCIDR: 172.16.0.0/24 PodCIDRs: 172.16.0.0/24 Non-terminated Pods: (5 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- kube-system etcd-k8s-head 0 (0%) 0 (0%) 0 (0%) 0 (0%) 24h kube-system kube-apiserver-k8s-head 250m (12%) 0 (0%) 0 (0%) 0 (0%) 24h kube-system kube-controller-manager-k8s-head 200m (10%) 0 (0%) 0 (0%) 0 (0%) 24h kube-system kube-proxy-k2vw9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 24h kube-system kube-scheduler-k8s-head 100m (5%) 0 (0%) 0 (0%) 0 (0%) 24h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 550m (27%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: <none> |
So, right in the middle of all that gibberish:
Ready False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
So, maybe try a different Vagrantfile after I’ve lost a day debugging this one.