How To Configure Elasticsearch After Installation

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

This guide will walk the reader through some of the myriad configurations possible within Elasticsearch. Every change that is made according to these instructions will enable different capabilities or disable them. Follow each detail to the letter for the intended outcome when finished. Some of the options that are possible will bring great enhancement and efficiency to the use of the service. Please, continue on to find out just what kind of alterations are potentially soon to be in the arsenal.

  • Elasticsearch comes well optimized with default settings that are ideal for most use cases, however some idea of how to fine-tuning and configuring is recommended in order to take advantage of Elasticsearch scalability.
  • Elasticsearch search default installation uses a directory that is private and temporary, which is created with a startup script.
  • Temporary directories may be excluded from periodic cleaning during installation using .deb and .rpm packages under system in Elasticsearch.

However, if intending to run the .tar.gz on Linux for any extended period, consider creating a temporary directory that is dedicated and will not have old directories and files removed. This directory must have permissions set to only allow Elasticsearch access to it. Next, set the $ES_TMPDIR environment variable to point to it before starting Elasticsearch.

  • Note: Use a client node for indexing and another for searching in order to relieve the data nodes from excess burden:

In addition, use separate ES client nodes for searching and indexing. This eases some of the load on the nodes, but most importantly it means the pipeline can communicate with a local client that can then make contact with the rest in the cluster.

ES nodes are established as master and data using true or false to set the two properties. These are:

1
2
3
4
5
6
7
Master node: node.master:true node.data:false


Data node: node.master:false node.data:true


Client node: node.master:false node.data:false

Prerequisites

  • To access the server will require root privileges and SSH with a private key in order to make modifications to settings and configuration files.
  • Make sure to already have Elasticsearch installed, and that the service is running.

from yaml.org – YAML Ain’t Markup Language (YAML™) Version 1.2

  • From Elastic’s requirement page: There is no hard-and-fast rule because Elasticsearch is useful in a wide range of responsibilities and with a stunning array of machines. These recommendations are a good starting point based on expert experience with the production clusters.—Elastic’s Hardware Requirements
  • To check the version in use, type this command in a UNIX terminal:
1
javac -version
  • If Java and JDK 8 are installed properly, the user will see this output on their terminal:
1
2
3
4
5
6
7
java version "1.8.0_65"


Java(TM) SE Runtime Environment (build 1.8.0_65-b17)


Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
  • Elasticsearch will not start if the version of JDK or JVM used is incompatible. The default version of Java used by a system can be modified by the JAVA_HOME environment variable—Elasticsearch v6.6 Setup Docs – JVM Version
  • SEE the ‘Configure JVM’ section for more details.

NOTE: To install JDK 8 on MacOS using Homebrew:

1
2
3
4
brew update


brew cask install java8

Configuration File Location

The configuration and YAML files are located in the installation directories of each respective ELK stack product.

IMPORTANT: It’s a good idea to make a backup of configuration files before changing them so they can quickly and easily be reverted back to their defaults without reinstalling anything.

  • In terminal a file can be copied using the cp (copy) command to make a new file by changing the name (and/or file extension) in the second parameter of the command. For example:
1
sudo cp config.yml config.bak
  • Now the user should have a .bak duplicate of the original configuration file.

Linux Location

The default paths for the configuration directories on Linux (both Debian and Red Hat distros) are as follows:

1
2
3
4
/etc/Elasticsearch


/etc/default/Elasticsearch

Windows Location

  • Elasticsearch’s default install location for Windows OS:
1
C:\ProgramData\Elastic\Elasticsearch\

MacOS Location

  • If installing Elasticsearch and Logstash on MacOS, using the Homebrew method, the Logstash installation directory is at:
1
/usr/local/Cellar/logstash/

or:

1
/usr/local/var/lib/Elasticsearch/

This command can also be entered into the MacOS terminal:

1
brew info Elasticsearch

See Elastic’s Directory Layout documentation for more details.

Configurations

YAML file configuration

Elasticsearch installation package comes with good defaults and requires very little configuration. Cluster Update Settings The API can change most of the running cluster.

The configuration files contain settings that are specific to each node (such as node.name and paths) and settings a node requires to be capable of joining a cluster like cluster.name and network.host.

  • If fine-tuning a cluster, or testing the effect that a certain configuration option has, the best approach would be asking for guidance in Elastic’s Community.

Configure Cluster

  • Cluster name identifies the cluster for auto-discovery.
  • If running multiple clusters and they are on the same network, be sure to use mutually exclusive names. For more information, consult Elastic’s Documentation.
1
cluster.name: od-fts1

Configure Node

  • Node names are generated dynamically on startup, so they do not need to be configured manually. Choosing a name with more meaning will have the advantage of persistence after a restart of the node::
1
node.name: od-fts1a

The node.name can be set as the server HOSTNAME. To find out more, consult Elastic’s Documentation.

1
node.name: ${HOSTNAME}

Configure Path

  • If using the .zip or .tar.gz archives, the data and logs directories are sub-folders of $ES_HOME. Should an upgrade take place for Elasticsearch, these important folders are in high risk of being deleted if left in their default locations.

  • In production use, the user should change the locations for the log and data folders:

1
2
3
path:
logs: /var/log/Elasticsearch
data: /var/data/Elasticsearch
  • RPM and Debian already uses custom paths for its log and data.
  • The path.data can be optionally set to multiple locations in which all paths are used for data storage (although the files of a shard will still be kept along the same path):
1
2
3
4
5
path:
data:
- /mnt/Elasticsearch_1
- /mnt/Elasticsearch_2
- /mnt/Elasticsearch_3

Configure Gateway

The gateway allows for persisting the cluster state between full restarts. Every status change (such as adding a shard) will be kept in the gateway. When starting the cluster the first time, the gateway will tell it what to read from. from GitHub.

  • The “local” gateway is the default type, being used in the code below (recommended):
1
gateway.type: local

Settings below manage how and when the recovery process initially starts on a restart of a full cluster (to reuse local data as much as possible when using a shared gateway). from GitHub

  • To enable the recovery process when N nodes of the cluster are up, use the below syntax
1
gateway.recover_after_nodes: 1
  • Set the timeout for initiating the recovery after N nodes are up through the syntax below, and it will accept time value(s).
1
gateway.recover_after_time: 10m

Set number of nodes expected in the cluster. Once they’re up, then (recover_after_nodes is met), start the process immediately and do not wait for recover_after_time expiration). from GitHub

1
gateway.expected_nodes: 2
  • Below syntax requires an explicit index creation
1
action.auto_create_index: false

Protect all indices from being accidentally closed/deleted. Individual indices can still be closed/deleted from GitHub

1
2
3
action.disable_close_all_indices: true
action.disable_delete_all_indices: true
action.disable_shutdown: true

Configure Network

  • Elasticsearch, by default, binds itself to the loopback address, and listens to default communication port [9200-9300] for HTTP traffic and port [9300-9400] for communication between the different nodes. (Notice that ports were in range means that when the port is unavailable, it automatically loops through this port until an open port is found).

In order to create a cluster with other nodes on different servers, a node must bind to an address that is non-loopback. While there are many network settings, usually all that is needed to configure is network.host: To find out more, check Elastic’s Documentation.

1
network.host: 192.168.1.10
  • The network.host setting can also interpret certain special values like _local, site, global_ and modifiers such as :ip4 and :ip6, for which details are found in Special values for network.hostedit. To learn more, consult Elastic’s Documentation.

Configure JVM

  • To select a specific version of Java to run as the default, configure the JAVA_HOME environment variable. Just right click My Computer and select Properties. Once there, click on the Advanced tab, and select the Environment Variables to edit the JAVA_HOME variable to point the system to the Java software location, for example: C:\Program Files\Java\jdk1.X.X (be sure to replace X with the proper version number for the JDK install).

NOTE: If nano opens a blank document, press CTRL + X to close the blank document and to make it reveal the .yml file. If there is only the empty document, then that means the path used to edit the kibana.yml file with nano is incorrect.

  • By default, Elasticsearch commands JVM to use heap with a min/max size of 1GB. It is critical to configure the heap specifically to ensure Elasticsearch has enough available.
  • The entire heap in jvm.options will be assigned by Elasticsearch via the Xmx (max heap size) and Xms (min heap size) settings.
  • There are several rules regarding the dependency of these values in relation to the amount of RAM available on the server.
  1. Set the min heap (Xms) and max heap (Xmx) as equal.
  2. The more heap that is available the more memory used for caching. However, it is worth noting that too much can subject the user to long waste collection pauses.
  3. Set heap size to no more than half the physical RAM to be sure there is enough left for kernel system caches.
  4. Do not set it above the cutoff JVM uses with compressed object pointers or (compressed oops); it varies but the cutoff is around 32 GB. Being under the limit can be verified by finding the line in the logs that reads:
1
heap size [1.9gb], compressed ordinary object pointers [true]
  1. It is better to remain below the threshold of zero-based compressed oops; although the cutoff varies, it is generally safe to use 26 GB on most systems. However, it can get as big as 30 GB on some others. Verification for being under the limit can be found by starting Elasticsearch with the JVM options -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode and finding the line that looks like so:
1
heap address: 0x000000022cf00000, size: 27848 MB, zero based Compressed Oops
  • showing that zero-based compressed oops are enabled instead of
1
heap address: 0x0000000118400000, size: 28270 MB, Compressed Oops with base: 0x00000001183ff000
  • Below are examples of how to set the heap size in the jvm.options file:
1
2
-Xms2g
-Xmx2g

Set the minimum size to 2g. Set the maximum size to 2g. The size can also be set with an environment variable. This is achieved by commenting the Xmx and Xms settings inside the jvm.options file by resetting these values like so: ES_JAVA_OPTS:

1
2
ES_JAVA_OPTS="-Xms2g -Xmx2g" ./bin/Elasticsearch
ES_JAVA_OPTS="-Xms4000m -Xmx4000m" ./bin/Elasticsearch
  • Set the min and max to 2 GB.

  • Set the min and max to 4000 MB.

  • Windows service configuration is different from the above procedure. To find out more, check Elastic’s Documentation.

The values that initially populae the Windows service are configurable to the above but they change after it has been installed.

Configure GC Logging

By default, Elasticsearch enables GC logs. These are configured in jvm.options and save to the same default location as Elasticsearch logs. The configuration default rotates logs every 64MB, able to consume up to 2GB of space on disk.

Conclusion

Hopefully, the information above was of great use in the configuration process. Guides like this one are created to pass on knowledge to the avid learner who seeks a greater understanding. With them, a reader can discover new capabilities and maybe even encounter something entirely unknown to them. The preceding walkthrough is littered with valuable links and documentation to allow the reader to master their subject. To get the most from this resource, explore them all to find out what truths they reveal.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.