See also ElasticHQ and Elasticsearch Concepts
Colours
Yellow => nodes are down
Red => data is missing
Upgrades
Do a rolling restart
- stop a Master node
- upgrade Master (don’t overwrite config)
- restart Master
- repeat for Data nodes
- note: when a data node is taken offline the shards will reshuffle and the cluster takes a big performance hit
- to avoid this set
cluster.routing.allocation.enable
tonone
to fully disable shard rebalancing on the cluster. Useful if you know you’re going to take a node offline for just a few minutes. Use: -
curl -XPUT 0:9200/_cluster/settings -H’Content-Type: application/json’ -d ‘{ “transient” : { “cluster.routing.allocation.enable” : “all” }}’
Restore indices from a snapshot
POST /_snapshot/my_backup/snapshot_1/_restore
but remember to close the index before and open it after.
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html
Delete indices
Use Curator.
curator_cli --host <host> show_indices
curator Issues
Unable to create client connection to Elasticsearch. Error: Elasticsearch version 1.7.5 incompatible with this version of Curator (5.5.4)
See https://www.elastic.co/guide/en/elasticsearch/client/curator/current/version-compatibility.html
Install Curator 3.5.0
pip install elasticsearch-curator==3.5.0
https://www.elastic.co/guide/en/elasticsearch/client/curator/3.5/installation.html
Annoyingly, the syntax between versions is completely different. E.g.
Curator 3: curator, show_indices
Curator 5: curator_cli, show indices
And for this error:
ERROR. At least one filter must be supplied.
in Curator 3, you need something like this:
curator --host 10.33.12.203 show indices --regex '.'
Configuration
gateway.recover_after_nodes: 8
E.g. 8, to avoid thrashing after initial cluster restart.
See also:
gateway.recover_after_time: 5m
gateway.expected_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.minimum_master_nodes: 2
Data nodes:
node.master: false
node.data: true
http.enabled: false
Master nodes:
node.master: true
node.data: false
JVM
Don’t change garbage collection or thread pools.
Do change heap size. Use half of available memory. e.g. 32GB.
- 32GB is enough
- Lucene needs the rest
- Heap size > 32GB is inefficient
E.g. ES_HEAP_SIZE=32g
/usr/bin/java -Xms30g -Xmx30g
Swap
Can harm performance. Disable by removing from /etc/fstab
.
Alternatively, set bootstrap.mlockall: true
in elasticsearch.yml to lock memory and prevent it being swapped.
File Descriptors / MMap
File Descriptors: 64,000
MMap: unlimited
See also https://docs.oracle.com/cd/E23389_01/doc.11116/e21036/perf002.htm
and sysctl vm.max_map_count
Maximum number of mmap()’ed ranges and how to set it on Linux?