Run Elasticsearch 6.2 Cluster On AWS Spot Fleet ~ Appychip

How To Run Elasticsearch 6.2 Cluster In Docker Using Spot Instances In AWS
Dockerising Elasticsearch 6.2

Elasticsearch is a very useful databases capable of handling enormous amount of data. Its capacity and performance is great but comes with a cost. If you are holding TBs of data in an Elasticsearch cluster then you might end up having around 10 data nodes, 3 master nodes and 1-3 client nodes. These numbers will cost you roughly around $7K per month if you have config similar to 30GB RAM on data nodes.

What if you can reduce this cost to less than $2K per month? Yes, you can save a lot if you use AWS spot instances.

Here we will setup a cluster having 3 data nodes of 16 GB RAM each and a master node having 8GB of RAM. We are keeping number of nodes limited just for the demo but you can grow this as per your requirement or use-case. So, let's get started...

Steps to be followed:

Create security group elasticsearch-sg
Create Elasticsearch Config (elasticsearch.yml and jvm.options)
Create S3 bucket to hold Elasticsearch config files
Create IAM Policy to access bucket
Create IAM Role and assign bucket access policy to it
Create Base Image (AMI) from which we will be spawning master node and data node
Create On-demand Master Node
Create Data nodes on Spot Fleet

#1 Create security group "elasticsearch-sg"

#2 Elasticsearch Configuration files

elasticsearch_master.yml for master node:

cluster.name: spotes

path.data: /usr/share/elasticsearch/data

network.host: 0.0.0.0

http.port: 9200

node.master: true

node.data: false

node.name: "nodename"

transport.tcp.compress: true

bootstrap.memory_lock: true

discovery.zen.minimum_master_nodes: 1

discovery.zen.ping.unicast.hosts: ['es1.xyz.vpc']

thread_pool.bulk.queue_size: 500

elasticsearch_data.yml file for data node:

cluster.name: spotes

path.data: /usr/share/elasticsearch/data

network.host: 0.0.0.0

http.port: 9200

node.master: false

node.data: true

node.name: "nodename"

bootstrap.memory_lock: true

transport.tcp.compress: true

discovery.zen.minimum_master_nodes: 1

discovery.zen.ping.unicast.hosts: ['es1.xyz.vpc']

thread_pool.bulk.queue_size: 500

master.jvm.options file for master node:

-Xms4g
-Xmx4g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+DisableExplicitGC
-XX:+AlwaysPreTouch
-server
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError

data.jvm.options file for data node

-Xms8g
-Xmx8g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+DisableExplicitGC
-XX:+AlwaysPreTouch
-server
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError

#3 Create S3 bucket to hold Elasticsearch config files

Create a bucket name es-configurations and upload all configuration files we created above

#4 Create IAM Policy to access bucket

Create following IAM policy (es-configurations-bucket-access)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:GetObject"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::es-configurations/*"
        }
    ]
}

#5 Create IAM Role and assign bucket access policy to it

Create an IAM role "elasticsearch-role" and attach above policy to it.

#6 Create Base Image (AMI) from which we will be spawning master node and data node

First we will launch an instance in which we will install docker, aws-cli and download elasticsearch docker image. After installing these basic stuff we will create an AMI from it which will be used to launch master and data node.

Now go ahead launching an instance by providing the following userdata to it:

#!/bin/bash

# output log of userdata to /var/log/user-data.log
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1

# Install awscli
apt-get update
apt install awscli -y

# Set max_map_count
echo 262144 | sudo tee /proc/sys/vm/max_map_count

# Install docker
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update
apt-cache policy docker-ce
apt-get install -y docker-ce
service docker restart

# Get official elasticsearch docker image
docker pull docker.elastic.co/elasticsearch/elasticsearch:6.2.3

# Create /etc/elasticsearch directory to hold elasticsearch config files like elasticsearch.yml and jvm.options
mkdir -p /etc/elasticsearch

When you are done running the above script, create an AMI from the current instance.

#7 Create On-demand Master Node

Create an on-demand instance of type having 8GB of memory as we are giving 4GB of HEAP and provide the following userdata to it:

#!/bin/bash
 
set -x
# output log of userdata to /var/log/user-data.log
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
 
aws s3 cp s3://es-configurations/elasticsearch_master.yml /etc/elasticsearch/elasticsearch.yml --region ap-south-1
aws s3 cp s3://es-configurations/master.jvm.options /etc/elasticsearch/jvm.options --region ap-south-1
 
sed -i -e "s/nodename/${HOSTNAME}/g" /etc/elasticsearch/elasticsearch.yml
 
mkdir -p /vol/es
 
chown -R 1000:1000 /vol
chown -R 1000:1000 /etc/elasticsearch
 
sysctl -w vm.max_map_count=262144
 
#start docker container
docker run --net=host -d -p 9200:9200 -e "xpack.security.enabled=false" --restart unless-stopped -v /vol/es:/usr/share/elasticsearch/data -v /etc/elasticsearch/jvm.options:/usr/share/elasticsearch/config/jvm.options -v /etc/elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml  --ulimit nofile=65536:65536 --ulimit memlock=-1:-1 docker.elastic.co/elasticsearch/elasticsearch:6.2.3

After launching master node. Make a route53 entry for es1.xyz.vpc with private IP or any domain you want to use for your master node.

#8 Create Data nodes on Spot Fleet

Now we will, create spot fleet request to launch data nodes as spot instance. Go to "Spot Requests" in AWS ec2 dashboard and click on "Request Spot Instance" button:

Select "Request and Maintain", set "total target capacity to 3" as we will be launching 3 data nodes.
Select the AMI we created above. Choose any instance type having 16GB of RAM (as we are setting HEAP to 8GB).
Select required VPC, AZ.
Add additional disk of size 50GB (This could differ as per your requirement)
You can provide health check, monitoring and other options.
Provide a security group (elasticsearch-sg in our case)
Give a key-pair name which can be used to SSH
Select "elasticsearch-role" in "IAM Instance Profile"
Provide the following userdata:

#!/bin/bash
 
set -x
# output log of userdata to /var/log/user-data.log
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
 
aws s3 cp s3://es-configurations/elasticsearch_data.yml /etc/elasticsearch/elasticsearch.yml --region ap-south-1
aws s3 cp s3://es-configurations/data.jvm.options /etc/elasticsearch/jvm.options --region ap-south-1

 
sed -i -e "s/nodename/${HOSTNAME}/g" /etc/elasticsearch/elasticsearch.yml
 
mkfs.xfs /dev/xvdba
mkdir -p /vol/es
mount /dev/xvdba /vol/es
 
chown -R 1000:1000 /vol
chown -R 1000:1000 /etc/elasticsearch
 
sysctl -w vm.max_map_count=262144
 
#start docker container
docker run --net=host -d -p 9200:9200 -e "xpack.security.enabled=false" --restart unless-stopped -v /vol/es:/usr/share/elasticsearch/data -v /etc/elasticsearch/jvm.options:/usr/share/elasticsearch/config/jvm.options -v /etc/elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml  --ulimit nofile=65536:65536 --ulimit memlock=-1:-1 docker.elastic.co/elasticsearch/elasticsearch:6.2.3

You can leave other settings to default. Click on "Launch", this will create a spot request and will launch three nodes which will eventually join the cluster.
After the nodes are ready, go to master node and make a curl request to check if nodes are in the cluster:

curl localhost:9200/_cat/nodes?v

This will show the list of all nodes.

Appychip

Sunday, 8 April 2018