Kubectl: create pod

Let’s try creating a pod.

  1. create a pod.ymlwith your favourite image:


kubectl create -f pod.yml


pod/hello-pod created

but we need to check what’s going on behind the scenes with:

kubectl get pods

“kubectl get pods” warnings

kubectl get pods
hello-pod 0/1 ContainerCreating 0 1m

kubectl describe pods


Reason: ContainerCreating

is because it’s still creating the container.

kubectl get pods
hello-pod 0/1 ImagePullBackOff 0 10m


kubectl describe pods

Reason: ImagePullBackOff

means there’s a problem with the image.

Check with:

docker inspect --type=image <name of image>:latest

I was getting:

Error: No such image:

So, let’s try another image. i.e. in pod.ymllet’s use:

image: hello-world:latest

Now create hits:


Let’s try deleting this pod with:



Let’s try --all as mentioned:

Solution is:



Let’s try create again:

What is a CrashLoopBackOff? How to alert, debug / troubleshoot, and fix Kubernetes CrashLoopBackOff events.



Other errors

Unable to connect to the server

Unable to connect to the server: dial tcp i/o timeout

  • check minikube is started

kubectl describe pods




curl: silent, using variables in bash

-s=> silent – so no Progress bar

-S=> don’t output errors



Using a variable in a curl call within Bash – remember to double-escape quotes. E.g.




Elasticsearch: ElasticHQ

What I’m hoping for is that it will make tasks simpler:

1. to see Unassigned shards I currently:

ssh into the node then:

curl -X GET "localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason" | grep UNASS

The output isn’t easy to understand. E.g.



ElasticHQ does mention that it will:

  • Monitor Nodes, Indices, Shards, and general cluster metrics.

However, if it’s via the Diagnostics pane then that takes ages – an indefinite amount of time. I’m currently at around 15 minutes and still seeing json scroll past in my console without seeing anything in the GUI.


2. delete an index

3. stop shard allocation



More info:

Elasticsearch: administration

See also ElasticHQ and Elasticsearch Concepts


Yellow => nodes are down

Red => data is missing


Do a rolling restart

  1. stop a Master node
  2. upgrade Master (don’t overwrite config)
  3. restart Master
  4. repeat for Data nodes
    1. note: when a data node is taken offline the shards will reshuffle and the cluster takes a big performance hit
    2. to avoid this set cluster.routing.allocation.enable to noneto fully disable shard rebalancing on the cluster. Useful if you know you’re going to take a node offline for just a few minutes. Use:
    3. curl -XPUT 0:9200/_cluster/settings -H’Content-Type: application/json’ -d ‘{ “transient” : { “cluster.routing.allocation.enable” : “all” }}’

Restore indices from a snapshot

POST /_snapshot/my_backup/snapshot_1/_restore

but remember to close the index before and open it after.


Delete indices

Use Curator.

curator_cli --host <host> show_indices

curator Issues

Unable to create client connection to Elasticsearch. Error: Elasticsearch version 1.7.5 incompatible with this version of Curator (5.5.4)

See https://www.elastic.co/guide/en/elasticsearch/client/curator/current/version-compatibility.html

Install Curator 3.5.0

pip install elasticsearch-curator==3.5.0



Annoyingly, the syntax between versions is completely different. E.g.

Curator 3: curator, show_indices

Curator 5: curator_cli, show indices

And for this error:

ERROR. At least one filter must be supplied.

in Curator 3, you need something like this:

curator --host show indices --regex '.'


gateway.recover_after_nodes: 8

E.g. 8, to avoid thrashing after initial cluster restart.

See also:

gateway.recover_after_time: 5m

gateway.expected_nodes: 2

discovery.zen.ping.multicast.enabled: false
discovery.zen.minimum_master_nodes: 2

Data nodes:

node.master: false
node.data: true

http.enabled: false

Master nodes:

node.master: true
node.data: false


Don’t change garbage collection or thread pools.

Do change heap size. Use half of available memory. e.g. 32GB.

  • 32GB is enough
  • Lucene needs the rest
  • Heap size > 32GB is inefficient


/usr/bin/java -Xms30g -Xmx30g


Can harm performance. Disable by removing from /etc/fstab.

Alternatively, set bootstrap.mlockall: true in elasticsearch.yml to lock memory and prevent it being swapped.

File Descriptors / MMap

File Descriptors: 64,000

MMap: unlimited

See also https://docs.oracle.com/cd/E23389_01/doc.11116/e21036/perf002.htm

and sysctl vm.max_map_count

Maximum number of mmap()’ed ranges and how to set it on Linux?



Elasticsearch: another snakes nest

I ran into a problem with UNASSIGNED shards in an Elasticsearch cluster after experiencing a degraded EC2 instance.

I did the usual steps to solve:

See Elasticsearch: degraded instance

and ran into a heap of problems.

I installed curator_cli using

pip install elasticsearch-curator

but found that, despite reams of pages using the command delete_indices, curator kept saying No such command.



Someone else had experienced this:


The solution seemed to be to downgrade some library called Click.



So I downgraded Click with:

pip install click==6.7 elasticsearch-curator==5.5.4

Now I’m running into

elasticsearch.exceptions.ElasticsearchException: Unable to create client connection to Elasticsearch. Error: Elasticsearch version 1.7.5 incompatible with this version of Curator (5.5.4)

It seems there’s a compatibility issue.



Elasticsearch: degraded instance

AWS will warn you when you have an instance running on degraded hardware. You will get a scheduled retirement date and, on this date, your instance will be switched off.

However, your hardware is likely already compromised. If you’re running Elasticsearch on this hardware you need to decommission the node and create another node.

The steps to do this are:

1. get current status

From outside box:

2. disable shard allocation

3. stop service

sudo service elasticsearch stop

4. terminate the instance via console

5. create a new instance – e.g. with Terraform

6. re-enable shard allocation

7. validate cluster health goes back to green