Elasticsearch Concepts

See also Elasticsearch: administration

Indexes: store all the JSON documents and are stored in a shard

Shards: a complete Lucene database


Indices can be stored in multiple shards.

i.e. if we have multiple nodes then shards will migrate across nodes – aka rebalancing.


Replicas: an exact duplicate of a shard (except designated as a Replica). Can configure as many replicas per shard as you like.

E.g. here you have 2 shards plus 2 replicas. Each contain the one index (I01).

http://www.snowcrash.eu/wp-content/uploads/2018/10/Screen-Shot-2018-10-16-at-11.05.46-AM-300x176.png 300w, http://www.snowcrash.eu/wp-content/uploads/2018/10/Screen-Shot-2018-10-16-at-11.05.46-AM-768x449.png 768w, http://www.snowcrash.eu/wp-content/uploads/2018/10/Screen-Shot-2018-10-16-at-11.05.46-AM-588x344.png 588w" sizes="(max-width: 815px) 100vw, 815px" />

Replicas are Read-only and can serve data thereby increasing scale.



Node Roles

Can run all on a single node but makes it more efficient if you separate them out.


  • most hard working
  • contains all shards
  • don’t typically receive service queries
  • tend to be the beefiest


  • gateway to the cluster
  • big increase in performance
  • handle all query requests and redirects them to the data nodes


  • brains of cluster
  • maintains cluster state
  • all nodes have a copy of the state but only the master can update cluster state

Capacity Planning

Data Nodes

To test, set up a load test on one node until node is completely saturated.

E.g. 1M documents on 1 node = 4.0 seconds then probably need 4 nodes to get to 1.0 second response time.

Master Nodes

To avoid split brain scenario:  set minimum_master_nodes to (number of master nodes / 2 ) + 1]

Should have at least 3.

Client Nodes

Could exist behind a load balancer.

E.g. in summary, a setup could be:

4 data nodes, 3 master nodes, 2 client nodes – i.e. a total of 9 nodes.

Server Requirements

  • CPU: more cores the better (favour over clock speed) – i.e. better to run more processes concurrently rather than run them faster
  • RAM: 64GB for Data nodes is ideal (e.g. in AWS an `i3.2xlarge` – https://aws.amazon.com/ec2/instance-types/i3/ )
  • Disks: fastest disks possible. Safe to use RAID 0 for more speed though not fault tolerant but Elasticsearch has shards. Avoid using NAS as performance will drop drastically
  • Networking: keep clustering within same data centre as shard rebalancing requires fast networking
  • VMs: don’t use for data nodes in production


Running on AWS

Data: i3.2xlarge



See also https://www.elastic.co/guide/en/elasticsearch/plugins/master/cloud-aws-best-practices.html

and https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing


Leave a Reply

Your email address will not be published. Required fields are marked *