curl: tips – silent, using variables in bash


-s=> silent – so no Progress bar

-S=> don’t output errors

bash variables

Using a variable in a curl call within Bash – remember to double-escape quotes. E.g.

curl -X POST -H 'Content-type: application/json' --data "{\"text\": \"${message}\"}"

processing curl output

e.g. I wanted to count the lines from a GET for an Elasticsearch nodes. This seemed the obvious:

curl 0:9200/_cat/indices?v&health=yellow | wc -l

but there are a few things wrong here.

1. it outputted everything to the terminal including all the green nodes and then seemed to hang until I hit Enter – not what I was expecting


there’s an ampersand character in the URL.

2. also there’s progress output from curl


silence with -s

curl -s "0:9200/_cat/indices?v&health=yellow" | wc -l

3. you still get the header from elasticsearch. i.e.

health status index uuid pri rep docs.count docs.deleted store.size



Elasticsearch: ElasticHQ

What I’m hoping for is that it will make tasks simpler:

1. to see Unassigned shards I currently:

ssh into the node then:

curl -X GET "localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason" | grep UNASS

The output isn’t easy to understand. E.g.



ElasticHQ does mention that it will:

  • Monitor Nodes, Indices, Shards, and general cluster metrics.

However, if it’s via the Diagnostics pane then that takes ages – an indefinite amount of time. I’m currently at around 15 minutes and still seeing json scroll past in my console without seeing anything in the GUI.


2. delete an index

3. stop shard allocation



More info:

Elasticsearch: administration

See also ElasticHQ and Elasticsearch Concepts


Yellow => nodes are down

Red => data is missing


Do a rolling restart

  1. stop a Master node
  2. upgrade Master (don’t overwrite config)
  3. restart Master
  4. repeat for Data nodes
    1. note: when a data node is taken offline the shards will reshuffle and the cluster takes a big performance hit
    2. to avoid this set cluster.routing.allocation.enable to noneto fully disable shard rebalancing on the cluster. Useful if you know you’re going to take a node offline for just a few minutes. Use:
    3. curl -XPUT 0:9200/_cluster/settings -H’Content-Type: application/json’ -d ‘{ “transient” : { “cluster.routing.allocation.enable” : “all” }}’

Restore indices from a snapshot

POST /_snapshot/my_backup/snapshot_1/_restore

but remember to close the index before and open it after.

Delete indices

Use Curator.

curator_cli --host <host> show_indices

curator Issues

Unable to create client connection to Elasticsearch. Error: Elasticsearch version 1.7.5 incompatible with this version of Curator (5.5.4)


Install Curator 3.5.0

pip install elasticsearch-curator==3.5.0


Annoyingly, the syntax between versions is completely different. E.g.

Curator 3: curator, show_indices

Curator 5: curator_cli, show indices

And for this error:

ERROR. At least one filter must be supplied.

in Curator 3, you need something like this:

curator --host show indices --regex '.'


gateway.recover_after_nodes: 8

E.g. 8, to avoid thrashing after initial cluster restart.

See also:

gateway.recover_after_time: 5m

gateway.expected_nodes: 2 false
discovery.zen.minimum_master_nodes: 2

Data nodes:

node.master: false true

http.enabled: false

Master nodes:

node.master: true false


Don’t change garbage collection or thread pools.

Do change heap size. Use half of available memory. e.g. 32GB.

  • 32GB is enough
  • Lucene needs the rest
  • Heap size > 32GB is inefficient


/usr/bin/java -Xms30g -Xmx30g


Can harm performance. Disable by removing from /etc/fstab.

Alternatively, set bootstrap.mlockall: true in elasticsearch.yml to lock memory and prevent it being swapped.

File Descriptors / MMap

File Descriptors: 64,000

MMap: unlimited

See also

and `sysctl vm.max_map_count`

Maximum number of mmap()’ed ranges and how to set it on Linux?



Elasticsearch: another snakes nest

I ran into a problem with UNASSIGNED shards in an Elasticsearch cluster after experiencing a degraded EC2 instance.

I did the usual steps to solve:

See Elasticsearch: degraded instance

and ran into a heap of problems.

I installed curator_cli using

pip install elasticsearch-curator

but found that, despite reams of pages using the command delete_indices, curator kept saying No such command.


Someone else had experienced this:

The solution seemed to be to downgrade some library called Click.


So I downgraded Click with:

pip install click==6.7 elasticsearch-curator==5.5.4

Now I’m running into

elasticsearch.exceptions.ElasticsearchException: Unable to create client connection to Elasticsearch. Error: Elasticsearch version 1.7.5 incompatible with this version of Curator (5.5.4)

It seems there’s a compatibility issue.


Elasticsearch Concepts

See also Elasticsearch: administration

Indexes: store all the JSON documents and are stored in a shard

Shards: a complete Lucene database


Indices can be stored in multiple shards.

i.e. if we have multiple nodes then shards will migrate across nodes – aka rebalancing.


Replicas: an exact duplicate of a shard (except designated as a Replica). Can configure as many replicas per shard as you like.

E.g. here you have 2 shards plus 2 replicas. Each contain the one index (I01). 300w, 768w, 588w" sizes="(max-width: 815px) 100vw, 815px" />

Replicas are Read-only and can serve data thereby increasing scale.


Node Roles

Can run all on a single node but makes it more efficient if you separate them out.


  • most hard working
  • contains all shards
  • don’t typically receive service queries
  • tend to be the beefiest


  • gateway to the cluster
  • big increase in performance
  • handle all query requests and redirects them to the data nodes


  • brains of cluster
  • maintains cluster state
  • all nodes have a copy of the state but only the master can update cluster state

Capacity Planning

Data Nodes

To test, set up a load test on one node until node is completely saturated.

E.g. 1M documents on 1 node = 4.0 seconds then probably need 4 nodes to get to 1.0 second response time.

Master Nodes

To avoid split brain scenario:  set minimum_master_nodes to (number of master nodes / 2 ) + 1]

Should have at least 3.

Client Nodes

Could exist behind a load balancer.

E.g. in summary, a setup could be:

4 data nodes, 3 master nodes, 2 client nodes – i.e. a total of 9 nodes.

Server Requirements

  • CPU: more cores the better (favour over clock speed) – i.e. better to run more processes concurrently rather than run them faster
  • RAM: 64GB for Data nodes is ideal (e.g. in AWS an `i3.2xlarge` – )
  • Disks: fastest disks possible. Safe to use RAID 0 for more speed though not fault tolerant but Elasticsearch has shards. Avoid using NAS as performance will drop drastically
  • Networking: keep clustering within same data centre as shard rebalancing requires fast networking
  • VMs: don’t use for data nodes in production


Running on AWS

Data: i3.2xlarge



See also