The easiest way is to use Elasticsearch: ElasticHQ or Grafana.
But, from the command line:
List nodes:
Summary: curl 0:9200/_cluster/health?pretty
Stats: curl 0:9200/_cat/nodes?v
The easiest way is to use Elasticsearch: ElasticHQ or Grafana.
But, from the command line:
List nodes:
Summary: curl 0:9200/_cluster/health?pretty
Stats: curl 0:9200/_cat/nodes?v
-s
=> silent – so no Progress bar
-S
=> don’t output errors
https://stackoverflow.com/questions/7373752/how-do-i-get-curl-to-not-show-the-progress-bar
Using a variable in a curl call within Bash – remember to double-escape quotes. E.g.
1 |
<span class="pln">curl </span><span class="pun">-</span><span class="pln">X POST </span><span class="pun">-</span><span class="pln">H </span><span class="str">'Content-type: application/json'</span> <span class="pun">--</span><span class="pln">data </span><span class="str">"{\"text\": \"${message}\"}"</span> |
e.g. I wanted to count the lines from a GET
for an Elasticsearch nodes. This seemed the obvious:
curl 0:9200/_cat/indices?v&health=yellow | wc -l
but there are a few things wrong here.
1. it outputted everything to the terminal including all the green nodes and then seemed to hang until I hit Enter – not what I was expecting
Solution:
there’s an ampersand character in the URL.
2. also there’s progress output from curl
Solution:
silence with -s
curl -s "0:9200/_cat/indices?v&health=yellow" | wc -l
3. you still get the header from elasticsearch. i.e.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
What I’m hoping for is that it will make tasks simpler:
1. to see Unassigned shards I currently:
ssh into the node then:
curl -X GET "localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason" | grep UNASS
The output isn’t easy to understand. E.g.
index.1 0 p UNASSIGNED NODE_LEFT
index.2 8 p UNASSIGNED INDEX_CREATED
ElasticHQ does mention that it will:
However, if it’s via the Diagnostics pane then that takes ages – an indefinite amount of time. I’m currently at around 15 minutes and still seeing json scroll past in my console without seeing anything in the GUI.
2. delete an index
3. stop shard allocation
More info:
See also ElasticHQ and Elasticsearch Concepts
Yellow => nodes are down
Red => data is missing
Do a rolling restart
cluster.routing.allocation.enable
to none
to fully disable shard rebalancing on the cluster. Useful if you know you’re going to take a node offline for just a few minutes. Use:curl -XPUT 0:9200/_cluster/settings -H’Content-Type: application/json’ -d ‘{ “transient” : { “cluster.routing.allocation.enable” : “all” }}’
POST /_snapshot/my_backup/snapshot_1/_restore
but remember to close the index before and open it after.
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html
Use Curator.
curator_cli --host <host> show_indices
Unable to create client connection to Elasticsearch. Error: Elasticsearch version 1.7.5 incompatible with this version of Curator (5.5.4)
See https://www.elastic.co/guide/en/elasticsearch/client/curator/current/version-compatibility.html
Install Curator 3.5.0
pip install elasticsearch-curator==3.5.0
https://www.elastic.co/guide/en/elasticsearch/client/curator/3.5/installation.html
Annoyingly, the syntax between versions is completely different. E.g.
Curator 3: curator, show_indices
Curator 5: curator_cli, show indices
And for this error:
ERROR. At least one filter must be supplied.
in Curator 3, you need something like this:
curator --host 10.33.12.203 show indices --regex '.'
gateway.recover_after_nodes: 8
E.g. 8, to avoid thrashing after initial cluster restart.
See also:
gateway.recover_after_time: 5m
gateway.expected_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.minimum_master_nodes: 2
Data nodes:
node.master: false
node.data: true
http.enabled: false
Master nodes:
node.master: true
node.data: false
JVM
Don’t change garbage collection or thread pools.
Do change heap size. Use half of available memory. e.g. 32GB.
E.g. ES_HEAP_SIZE=32g
/usr/bin/java -Xms30g -Xmx30g
Can harm performance. Disable by removing from /etc/fstab
.
Alternatively, set bootstrap.mlockall: true
in elasticsearch.yml to lock memory and prevent it being swapped.
File Descriptors: 64,000
MMap: unlimited
See also https://docs.oracle.com/cd/E23389_01/doc.11116/e21036/perf002.htm
and sysctl vm.max_map_count
Maximum number of mmap()’ed ranges and how to set it on Linux?
To simplify the ecosystem.
and
See also https://www.elastic.co/guide/en/elasticsearch/client/curator/current/version-compatibility.html
I ran into a problem with UNASSIGNED shards in an Elasticsearch cluster after experiencing a degraded EC2 instance.
I did the usual steps to solve:
See Elasticsearch: degraded instance
and ran into a heap of problems.
I installed curator_cli using
pip install elasticsearch-curator
but found that, despite reams of pages using the command delete_indices
, curator kept saying No such command
.
Someone else had experienced this:
https://discuss.elastic.co/t/not-able-to-use-delete-indices-for-curator-cli/151490
The solution seemed to be to downgrade some library called Click.
https://github.com/elastic/curator/issues/1279
So I downgraded Click with:
pip install click==6.7 elasticsearch-curator==5.5.4
Now I’m running into
elasticsearch.exceptions.ElasticsearchException: Unable to create client connection to Elasticsearch. Error: Elasticsearch version 1.7.5 incompatible with this version of Curator (5.5.4)
It seems there’s a compatibility issue.
https://www.elastic.co/guide/en/elasticsearch/client/curator/current/version-compatibility.html
See also Elasticsearch: administration
Indexes: store all the JSON documents and are stored in a shard
Shards: a complete Lucene database
Indices can be stored in multiple shards.
i.e. if we have multiple nodes then shards will migrate across nodes – aka rebalancing.
Replicas: an exact duplicate of a shard (except designated as a Replica). Can configure as many replicas per shard as you like.
E.g. here you have 2 shards plus 2 replicas. Each contain the one index (I01
).
http://www.snowcrash.eu/wp-content/uploads/2018/10/Screen-Shot-2018-10-16-at-11.05.46-AM-300x176.png 300w, http://www.snowcrash.eu/wp-content/uploads/2018/10/Screen-Shot-2018-10-16-at-11.05.46-AM-768x449.png 768w, http://www.snowcrash.eu/wp-content/uploads/2018/10/Screen-Shot-2018-10-16-at-11.05.46-AM-588x344.png 588w" sizes="(max-width: 815px) 100vw, 815px" />
Replicas are Read-only and can serve data thereby increasing scale.
https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-shards.html
Can run all on a single node but makes it more efficient if you separate them out.
Data:
Client
Master
To test, set up a load test on one node until node is completely saturated.
E.g. 1M documents on 1 node = 4.0 seconds then probably need 4 nodes to get to 1.0 second response time.
To avoid split brain scenario: set minimum_master_nodes to (number of master nodes / 2 ) + 1]
Should have at least 3.
Could exist behind a load balancer.
E.g. in summary, a setup could be:
4 data nodes, 3 master nodes, 2 client nodes – i.e. a total of 9 nodes.
i3.2xlarge
– https://aws.amazon.com/ec2/instance-types/i3/ )
Data: i3.2xlarge
Master:
Client:
See also https://www.elastic.co/guide/en/elasticsearch/plugins/master/cloud-aws-best-practices.html
and https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
Curator
is an Elasticsearch Python API which helps manage indices and snapshots.
Curator: https://curator.readthedocs.io/en/latest/index.html#
Rotating data in Amazon Elasticsearch with Curator: https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/curator.html