How To Run Elasticsearch 6.2 Cluster In Docker Using Spot Instances In AWS
Dockerising Elasticsearch 6.2
Elasticsearch is a very useful databases capable of handling enormous amount of data. Its capacity and performance is great but comes with a cost. If you are holding TBs of data in an Elasticsearch cluster then you might end up having around 10 data nodes, 3 master nodes and 1-3 client nodes. These numbers will cost you roughly around $7K per month if you have config similar to 30GB RAM on data nodes.
What if you can reduce this cost to less than $2K per month? Yes, you can save a lot if you use AWS spot instances.
Here we will setup a cluster having 3 data nodes of 16 GB RAM each and a master node having 8GB of RAM. We are keeping number of nodes limited just for the demo but you can grow this as per your requirement or use-case. So, let's get started...
elasticsearch_data.yml file for data node:
What if you can reduce this cost to less than $2K per month? Yes, you can save a lot if you use AWS spot instances.
Here we will setup a cluster having 3 data nodes of 16 GB RAM each and a master node having 8GB of RAM. We are keeping number of nodes limited just for the demo but you can grow this as per your requirement or use-case. So, let's get started...
Steps to be followed:
- Create security group elasticsearch-sg
- Create Elasticsearch Config (elasticsearch.yml and jvm.options)
- Create S3 bucket to hold Elasticsearch config files
- Create IAM Policy to access bucket
- Create IAM Role and assign bucket access policy to it
- Create Base Image (AMI) from which we will be spawning master node and data node
- Create On-demand Master Node
- Create Data nodes on Spot Fleet
#1 Create security group "elasticsearch-sg"
#2 Elasticsearch Configuration files
elasticsearch_master.yml for master node:cluster.name: spotes path.data: /usr/share/elasticsearch/data network.host: 0.0.0.0 http.port: 9200 node.master: true node.data: false node.name: "nodename" transport.tcp.compress: true bootstrap.memory_lock: true discovery.zen.minimum_master_nodes: 1 discovery.zen.ping.unicast.hosts: ['es1.xyz.vpc'] thread_pool.bulk.queue_size: 500
elasticsearch_data.yml file for data node:
cluster.name: spotes path.data: /usr/share/elasticsearch/data network.host: 0.0.0.0 http.port: 9200 node.master: false node.data: true node.name: "nodename" bootstrap.memory_lock: true transport.tcp.compress: true discovery.zen.minimum_master_nodes: 1 discovery.zen.ping.unicast.hosts: ['es1.xyz.vpc'] thread_pool.bulk.queue_size: 500
-Xms4g -Xmx4g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError
data.jvm.options file for data node
-Xms8g -Xmx8g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError
#3 Create S3 bucket to hold Elasticsearch config files
Create a bucket name es-configurations and upload all configuration files we created above#4 Create IAM Policy to access bucket
Create following IAM policy (es-configurations-bucket-access){ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetObject" ], "Effect": "Allow", "Resource": "arn:aws:s3:::es-configurations/*" } ] }
#5 Create IAM Role and assign bucket access policy to it
Create an IAM role "elasticsearch-role" and attach above policy to it.#6 Create Base Image (AMI) from which we will be spawning master node and data node
First we will launch an instance in which we will install docker, aws-cli and download elasticsearch docker image. After installing these basic stuff we will create an AMI from it which will be used to launch master and data node.Now go ahead launching an instance by providing the following userdata to it:
#!/bin/bash # output log of userdata to /var/log/user-data.log exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 # Install awscli apt-get update apt install awscli -y # Set max_map_count echo 262144 | sudo tee /proc/sys/vm/max_map_count # Install docker curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" apt-get update apt-cache policy docker-ce apt-get install -y docker-ce service docker restart # Get official elasticsearch docker image docker pull docker.elastic.co/elasticsearch/elasticsearch:6.2.3 # Create /etc/elasticsearch directory to hold elasticsearch config files like elasticsearch.yml and jvm.options mkdir -p /etc/elasticsearch
When you are done running the above script, create an AMI from the current instance.
#7 Create On-demand Master Node
Create an on-demand instance of type having 8GB of memory as we are giving 4GB of HEAP and provide the following userdata to it:
#!/bin/bash set -x # output log of userdata to /var/log/user-data.log exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 aws s3 cp s3://es-configurations/elasticsearch_master.yml /etc/elasticsearch/elasticsearch.yml --region ap-south-1 aws s3 cp s3://es-configurations/master.jvm.options /etc/elasticsearch/jvm.options --region ap-south-1 sed -i -e "s/nodename/${HOSTNAME}/g" /etc/elasticsearch/elasticsearch.yml mkdir -p /vol/es chown -R 1000:1000 /vol chown -R 1000:1000 /etc/elasticsearch sysctl -w vm.max_map_count=262144 #start docker container docker run --net=host -d -p 9200:9200 -e "xpack.security.enabled=false" --restart unless-stopped -v /vol/es:/usr/share/elasticsearch/data -v /etc/elasticsearch/jvm.options:/usr/share/elasticsearch/config/jvm.options -v /etc/elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml --ulimit nofile=65536:65536 --ulimit memlock=-1:-1 docker.elastic.co/elasticsearch/elasticsearch:6.2.3
After launching master node. Make a route53 entry for es1.xyz.vpc with private IP or any domain you want to use for your master node.
#8 Create Data nodes on Spot Fleet
Now we will, create spot fleet request to launch data nodes as spot instance. Go to "Spot Requests" in AWS ec2 dashboard and click on "Request Spot Instance" button:
- Select "Request and Maintain", set "total target capacity to 3" as we will be launching 3 data nodes.
- Select the AMI we created above. Choose any instance type having 16GB of RAM (as we are setting HEAP to 8GB).
- Select required VPC, AZ.
- Add additional disk of size 50GB (This could differ as per your requirement)
- You can provide health check, monitoring and other options.
- Provide a security group (elasticsearch-sg in our case)
- Give a key-pair name which can be used to SSH
- Select "elasticsearch-role" in "IAM Instance Profile"
- Provide the following userdata:
#!/bin/bash set -x # output log of userdata to /var/log/user-data.log exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 aws s3 cp s3://es-configurations/elasticsearch_data.yml /etc/elasticsearch/elasticsearch.yml --region ap-south-1 aws s3 cp s3://es-configurations/data.jvm.options /etc/elasticsearch/jvm.options --region ap-south-1 sed -i -e "s/nodename/${HOSTNAME}/g" /etc/elasticsearch/elasticsearch.yml mkfs.xfs /dev/xvdba mkdir -p /vol/es mount /dev/xvdba /vol/es chown -R 1000:1000 /vol chown -R 1000:1000 /etc/elasticsearch sysctl -w vm.max_map_count=262144 #start docker container docker run --net=host -d -p 9200:9200 -e "xpack.security.enabled=false" --restart unless-stopped -v /vol/es:/usr/share/elasticsearch/data -v /etc/elasticsearch/jvm.options:/usr/share/elasticsearch/config/jvm.options -v /etc/elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml --ulimit nofile=65536:65536 --ulimit memlock=-1:-1 docker.elastic.co/elasticsearch/elasticsearch:6.2.3
You can leave other settings to default. Click on "Launch", this will create a spot request and will launch three nodes which will eventually join the cluster.
After the nodes are ready, go to master node and make a curl request to check if nodes are in the cluster:
curl localhost:9200/_cat/nodes?v
This will show the list of all nodes.
Nice Article ... Thanks
ReplyDelete