Kubernetes: an odyssey of over complexity

To play around with Kubernetes, rather than building one from scratch and having to fix a billion different things that could go wrong, I decided to start off with a completely working Kubernetes cluster. So, from https://medium.com/@raj10x/multi-node-kubernetes-cluster-with-vagrant-virtualbox-and-kubeadm-9d3eaac28b98 , I used the Vagrantfile gist at the end and ran vagrant up

and found myself, after thousands of unintelligible lines of code had scrolled past, back at the command line. Had things worked?

Did I need to be worried about the reams of gibberish output like:

k8s-head: service/calico-typha created
k8s-head: Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
k8s-head: customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
k8s-head: customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
k8s-head: customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
k8s-head: customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
k8s-head: customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
k8s-head: customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
k8s-head: customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
k8s-head: customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
k8s-head: customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
k8s-head: serviceaccount/calico-node created
k8s-head: unable to recognize "https://raw.githubusercontent.com/ecomm-integration-ballerina/kubernetes-cluster/master/calico/calico.yaml": no matches for kind "Deployment" in version "apps/v1beta1"
k8s-head: unable to recognize "https://raw.githubusercontent.com/ecomm-integration-ballerina/kubernetes-cluster/master/calico/calico.yaml": no matches for kind "DaemonSet" in version "extensions/v1beta1"
k8s-head: W0921 14:01:39.054627 6427 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
 k8s-node-2: Processing triggers for systemd (229-4ubuntu21.29) ...
 k8s-node-2: sed: can't read /etc/default/kubelet: No such file or directory
 k8s-node-2: dpkg-preconfigure: unable to re-open stdin: No such file or directory
 k8s-node-2: Warning: Permanently added '192.168.205.10' (ECDSA) to the list of known hosts.
 k8s-node-2: [preflight] Running pre-flight checks
 k8s-node-2: [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

Who knows?! I had no idea what the state of the cluster was. Had anything worked? Had something worked? Had nothing worked?

So, some debugging:

  1. State of VMs
vagrant status
Current machine states:

k8s-head running (virtualbox)
k8s-node-1 running (virtualbox)
k8s-node-2 running (virtualbox)

So, at least the VMs are running. That would have been a good useful thing to output!

2. State of nodes

(^^^^^ I try not to go flying off on a tangent with technology – it’s so easy to end up going down a rabbit hole of realising this is wrong and then that’s wrong however one thing that seems consistently broken across many editors is this 1. 2. numbering indent. The first indent is almost impossible to delete and the second indent is almost impossible to insert)

vagrant ssh k8s-head

kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h

kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-head NotReady master 23h v1.19.2
k8s-node-1 NotReady <none> 23h v1.19.2
k8s-node-2 NotReady <none> 23h v1.19.2

So, the nodes don’t seem to be ready.

vagrant@k8s-head:~$ alias k=kubectl
vagrant@k8s-head:~$ k get po -o wide
No resources found in default namespace.
vagrant@k8s-head:~$ k get po -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-f9fd979d6-4gmwp 0/1 Pending 0 23h <none> <none> <none> <none>
kube-system coredns-f9fd979d6-pgdtv 0/1 Pending 0 23h <none> <none> <none> <none>

So, something seems wrong with coredns.

Having all these namespaces just adds to the confusion / complexity. E.g.

vagrant@k8s-head:~$ k logs coredns-f9fd979d6-4gmwp
Error from server (NotFound): pods "coredns-f9fd979d6-4gmwp" not found
vagrant@k8s-head:~$ k logs coredns-f9fd979d6-4gmwp --all-namespaces
Error: unknown flag: --all-namespaces
See 'kubectl logs --help' for usage.

Getting the logs of something shouldn’t be this difficult. You shouldn’t have to pick up a book on Kubernetes or do the CKA or search StackOverflow to find out simple stuff like this.

OK, this page sounds promising: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/

This page shows how to debug Pods and ReplicationControllers.

Let’s try their first example:

vagrant@k8s-head:~$ kubectl describe pods coredns
Error from server (NotFound): pods "coredns" not found
vagrant@k8s-head:~$ kubectl describe pods coredns-f9fd979d6-4gmwp
Error from server (NotFound): pods "coredns-f9fd979d6-4gmwp" not found

It then says:

Look at the output of the kubectl describe … command above. There should be messages from the scheduler about why it can not schedule your pod

There aren’t any messages from the scheduler so another dead end.

 

So, going to Slack Kubernetes Office Hours someone suggests:

kubectl -n kube-system logs coredns-f9fd979d6-4gmwp

which usefully outputs nothing at all. Literally nothing. That’s a big fat nothing useful at all. Not even a This Pod is Pending!

It finally turned out the magic invocation was:

kubectl -n kube-system describe pod coredns-f9fd979d6-4gmwp
Name: coredns-f9fd979d6-4gmwp
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=f9fd979d6
Annotations: <none>
Status: Pending
IP: 
IPs: <none>
Controlled By: ReplicaSet/coredns-f9fd979d6
Containers:
coredns:
Image: k8s.gcr.io/coredns:1.7.0
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-gv9wj (ro)
Conditions:
Type Status
PodScheduled False 
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-gv9wj:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-gv9wj
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 76m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.

and after a whole lot more gibberish it finally gets to:

Warning FailedScheduling 76m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.

and

kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-head NotReady master 24h v1.19.2
k8s-node-1 NotReady <none> 24h v1.19.2
k8s-node-2 NotReady <none> 24h v1.19.2

So, coredns did not deploy because the nodes were tainted with not-ready rather than the Nodes being in a NotReady status due to coredns being pranged. i.e. the problem is with the Nodes.

Checking head:

kubectl describe node k8s-head
Name: k8s-head
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8s-head
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 21 Sep 2020 14:01:34 +0000
Taints: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: k8s-head
AcquireTime: <unset>
RenewTime: Tue, 22 Sep 2020 14:24:35 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
InternalIP: 10.0.2.15
Hostname: k8s-head
Capacity:
cpu: 2
ephemeral-storage: 10098468Ki
hugepages-2Mi: 0
memory: 2047912Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 9306748094
hugepages-2Mi: 0
memory: 1945512Ki
pods: 110
System Info:
Machine ID: e7bd138c751d41f4a33dd882c048ced4
System UUID: 33E00DBD-58B6-E844-97AF-749AD4E96AB6
Boot ID: d2d28c19-c5ef-443a-9c1d-20ebeaeb1082
Kernel Version: 4.4.0-189-generic
OS Image: Ubuntu 16.04.7 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.3
Kubelet Version: v1.19.2
Kube-Proxy Version: v1.19.2
PodCIDR: 172.16.0.0/24
PodCIDRs: 172.16.0.0/24
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system etcd-k8s-head 0 (0%) 0 (0%) 0 (0%) 0 (0%) 24h
kube-system kube-apiserver-k8s-head 250m (12%) 0 (0%) 0 (0%) 0 (0%) 24h
kube-system kube-controller-manager-k8s-head 200m (10%) 0 (0%) 0 (0%) 0 (0%) 24h
kube-system kube-proxy-k2vw9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 24h
kube-system kube-scheduler-k8s-head 100m (5%) 0 (0%) 0 (0%) 0 (0%) 24h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 550m (27%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>

So, right in the middle of all that gibberish:

Ready False Tue, 22 Sep 2020 14:22:39 +0000 Mon, 21 Sep 2020 14:01:30 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

So, maybe try a different Vagrantfile after I’ve lost a day debugging this one.

 

git: Git Flow vs Trunk Based Development

Git Flow:

  • problems:
    • long-lived feature branches and merge hell (i.e. merge conflicts with other people’s code): https://stxnext.com/blog/2018/02/28/escape-merge-hell-why-i-prefer-trunk-based-development-over-feature-branching-and-gitflow/
    • database migrations – e.g. a db migration living in a long-lived feature branch

Trunk Based Development: release from tagged branches off master

  • short-lived branch and merge – one core rule: deploy a new commit to trunk every day
  • revert commit (if something ends up in production that you don’t want)
  • to avoid delivering code of an unfinished feature: branch by abstraction, feature flag
  • one button rollbacks (i.e. in CI/CD pipeline)
  • automated smoke tests in production that automatically roll back code if it fails a test
  • commit directly to master

Links:

git: remove secrets from history

Let’s say you’ve accidentally added a password into git. Here’s how you remove it:

1. if you have only committed locally

a. and it’s your last commit

  • just edit the file and run `git commit -a –amend`

b. and it’s a previous commit

  • do an interactive rebase with git rebase -i origin/master
  • change the pick to edit where you want to edit
  • amend the commit with git commit --amend
  • and continue with git rebase --continue

https://stackoverflow.com/questions/872565/remove-sensitive-files-and-their-commits-from-git-history

2. if you have committed and pushed to GitHub

https://blog.ostermiller.org/git-remove-from-history

https://blog.tinned-software.net/remove-files-from-git-history/

Passive aggressive handovers

Possibly one of the worst handovers I’ve ever experienced. I overheard this recently when a consultant was doing a handover to a colleague. Some classics!

  • It’s all self-documenting code. It’s obvious
  • Oh, you need to run it in a separate profile to get it to work, Choose the Debug Profile
  • Come on, this is obvious. Don’t embarrass yourself by asking
  • You’ve asked three times already. Don’t embarrass yourself by asking
  • One moment, we’ll get there (in response to a question – and then never returning to the question)
  • Dunno if I’ve the energy
  • Will my batteries on my laptop survive? (when at 100%)

I’m not making this up. These were all real phrases!

And then there were all the passive-aggressive phrases like “Yes” or “No” in response to questions instead of elaborating and explaining.

And the git commit history was funny. Line after line of commit messages saying “WIP”! And all the merges to master were done by himself – no peer reviews.

You can’t make this stuff up!

 

git and Sublime Text integration

I’ve tried GitSavvy and struggled using it.

I like Sublime Merge.

Install with Command Palette (Ctrl Shift P), Install, Sublimemerge 3.

Then use via Command Palette. E.g. blame > Sublime Merge: Blame File will launch a new window showing tons of detail.

Also, if you’re wanting Blame integration, take a look at Blame Explorer. Once installed just hover over the line number in a file to see the change log.