How To Configure Elasticsearch After Installation
This guide will walk the reader through some of the myriad configurations possible within Elasticsearch. Every change that is made according to these instructions will enable different capabilities or disable them. Follow each detail to the letter for the intended outcome when finished. Some of the options that are possible will bring great enhancement and efficiency to the use of the service. Please, continue on to find out just what kind of alterations are potentially soon to be in the arsenal.
- Elasticsearch comes well optimized with default settings that are ideal for most use cases, however some idea of how to fine-tuning and configuring is recommended in order to take advantage of Elasticsearch scalability.
- Elasticsearch search default installation uses a directory that is private and temporary, which is created with a startup script.
- Temporary directories may be excluded from periodic cleaning during installation using
.rpmpackages under system in Elasticsearch.
However, if intending to run the .tar.gz on Linux for any extended period, consider creating a temporary directory that is dedicated and will not have old directories and files removed. This directory must have permissions set to only allow Elasticsearch access to it. Next, set the
$ES_TMPDIRenvironment variable to point to it before starting Elasticsearch.
- Note: Use a client node for indexing and another for searching in order to relieve the data nodes from excess burden:
In addition, use separate ES client nodes for searching and indexing. This eases some of the load on the nodes, but most importantly it means the pipeline can communicate with a local client that can then make contact with the rest in the cluster.
ES nodes are established as master and data using true or false to set the two properties. These are:
Master node: node.master:true node.data:false
Data node: node.master:false node.data:true
Client node: node.master:false node.data:false
- To access the server will require root privileges and SSH with a private key in order to make modifications to settings and configuration files.
- It is recommended to have Java version 8 release installed to run Elasticsearch. Only Elasticsearch v6.2, or newer, has JDK 9 support.
- Make sure to already have Elasticsearch installed, and that the service is running.
- From Elastic’s requirement page: There is no hard-and-fast rule because Elasticsearch is useful in a wide range of responsibilities and with a stunning array of machines. These recommendations are a good starting point based on expert experience with the production clusters.—Elastic’s Hardware Requirements
- Elasticsearch is built on Java and requires Java Development Kit to run. It is recommended to have Java 8 series installed to run Elasticsearch. Only Elasticsearch v6.2, or newer, has JDK 9 support. JDK can be downloaded from their website.
- To check the version in use, type this command in a UNIX terminal:
- If Java and JDK 8 are installed properly, the user will see this output on their terminal:
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
- Elasticsearch will not start if the version of JDK or JVM used is incompatible. The default version of Java used by a system can be modified by the
JAVA_HOMEenvironment variable—Elasticsearch v6.6 Setup Docs – JVM Version
- SEE the ‘Configure JVM’ section for more details.
NOTE: To install JDK 8 on MacOS using Homebrew:
brew cask install java8
Configuration File Location
The configuration and YAML files are located in the installation directories of each respective ELK stack product.
IMPORTANT: It’s a good idea to make a backup of configuration files before changing them so they can quickly and easily be reverted back to their defaults without reinstalling anything.
- In terminal a file can be copied using the
cp(copy) command to make a new file by changing the name (and/or file extension) in the second parameter of the command. For example:
sudo cp config.yml config.bak
- Now the user should have a
.bakduplicate of the original configuration file.
The default paths for the configuration directories on Linux (both Debian and Red Hat distros) are as follows:
- Elasticsearch’s default install location for Windows OS:
- If installing Elasticsearch and Logstash on MacOS, using the Homebrew method, the Logstash installation directory is at:
This command can also be entered into the MacOS terminal:
brew info Elasticsearch
See Elastic’s Directory Layout documentation for more details.
YAML file configuration
Elasticsearch installation package comes with good defaults and requires very little configuration. Cluster Update Settings The API can change most of the running cluster.
The configuration files contain settings that are specific to each node (such as
node.nameand paths) and settings a node requires to be capable of joining a cluster like
- If fine-tuning a cluster, or testing the effect that a certain configuration option has, the best approach would be asking for guidance in Elastic’s Community.
- Cluster name identifies the cluster for auto-discovery.
- If running multiple clusters and they are on the same network, be sure to use mutually exclusive names. For more information, consult Elastic’s Documentation.
- Node names are generated dynamically on startup, so they do not need to be configured manually. Choosing a name with more meaning will have the advantage of persistence after a restart of the node::
node.namecan be set as the server HOSTNAME. To find out more, consult Elastic’s Documentation.
If using the .zip or .tar.gz archives, the data and logs directories are sub-folders of $ES_HOME. Should an upgrade take place for Elasticsearch, these important folders are in high risk of being deleted if left in their default locations.
In production use, the user should change the locations for the log and data folders:
- RPM and Debian already uses custom paths for its log and data.
path.datacan be optionally set to multiple locations in which all paths are used for data storage (although the files of a shard will still be kept along the same path):
The gateway allows for persisting the cluster state between full restarts. Every status change (such as adding a shard) will be kept in the gateway. When starting the cluster the first time, the gateway will tell it what to read from. from GitHub.
- Consult the Elastic documentation for the various types of implemented gateways.
- The “local” gateway is the default type, being used in the code below (recommended):
Settings below manage how and when the recovery process initially starts on a restart of a full cluster (to reuse local data as much as possible when using a shared gateway). from GitHub
- To enable the recovery process when N nodes of the cluster are up, use the below syntax
- Set the timeout for initiating the recovery after N nodes are up through the syntax below, and it will accept time value(s).
Set number of nodes expected in the cluster. Once they’re up, then (recover_after_nodes is met), start the process immediately and do not wait for recover_after_time expiration). from GitHub
- Below syntax requires an explicit index creation
Protect all indices from being accidentally closed/deleted. Individual indices can still be closed/deleted from GitHub
- Elasticsearch, by default, binds itself to the loopback address, and listens to default communication port [9200-9300] for HTTP traffic and port [9300-9400] for communication between the different nodes. (Notice that ports were in range means that when the port is unavailable, it automatically loops through this port until an open port is found).
In order to create a cluster with other nodes on different servers, a node must bind to an address that is non-loopback. While there are many network settings, usually all that is needed to configure is network.host: To find out more, check Elastic’s Documentation.
network.host settingcan also interpret certain special values like _local, site, global_ and modifiers such as :ip4 and :ip6, for which details are found in Special values for
network.hostedit. To learn more, consult Elastic’s Documentation.
- To select a specific version of Java to run as the default, configure the
JAVA_HOMEenvironment variable. Just right click
My Computerand select
Properties. Once there, click on the
Advancedtab, and select the
Environment Variablesto edit the
JAVA_HOMEvariable to point the system to the Java software location, for example:
C:\Program Files\Java\jdk1.X.X(be sure to replace
Xwith the proper version number for the JDK install).
nanoopens a blank document, press CTRL + X to close the blank document and to make it reveal the
.ymlfile. If there is only the empty document, then that means the path used to edit the
- By default, Elasticsearch commands JVM to use heap with a min/max size of 1GB. It is critical to configure the heap specifically to ensure Elasticsearch has enough available.
- The entire heap in jvm.options will be assigned by Elasticsearch via the Xmx (max heap size) and Xms (min heap size) settings.
- There are several rules regarding the dependency of these values in relation to the amount of RAM available on the server.
- Set the min heap (Xms) and max heap (Xmx) as equal.
- The more heap that is available the more memory used for caching. However, it is worth noting that too much can subject the user to long waste collection pauses.
- Set heap size to no more than half the physical RAM to be sure there is enough left for kernel system caches.
- Do not set it above the cutoff JVM uses with compressed object pointers or (compressed oops); it varies but the cutoff is around 32 GB. Being under the limit can be verified by finding the line in the logs that reads:
heap size [1.9gb], compressed ordinary object pointers [true]
- It is better to remain below the threshold of zero-based compressed oops; although the cutoff varies, it is generally safe to use 26 GB on most systems. However, it can get as big as 30 GB on some others. Verification for being under the limit can be found by starting Elasticsearch with the JVM options
-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsModeand finding the line that looks like so:
heap address: 0x000000022cf00000, size: 27848 MB, zero based Compressed Oops
- showing that zero-based compressed oops are enabled instead of
heap address: 0x0000000118400000, size: 28270 MB, Compressed Oops with base: 0x00000001183ff000
- Below are examples of how to set the heap size in the jvm.options file:
Set the minimum size to 2g.
Set the maximum size to 2g.
The size can also be set with an environment variable. This is achieved by commenting the Xmx and Xms settings inside the jvm.options file by resetting these values like so:
ES_JAVA_OPTS="-Xms2g -Xmx2g" ./bin/Elasticsearch
ES_JAVA_OPTS="-Xms4000m -Xmx4000m" ./bin/Elasticsearch
Set the min and max to 2 GB.
Set the min and max to 4000 MB.
- Windows service configuration is different from the above procedure. To find out more, check Elastic’s Documentation.
The values that initially populae the Windows service are configurable to the above but they change after it has been installed.
Configure GC Logging
By default, Elasticsearch enables GC logs. These are configured in jvm.options and save to the same default location as Elasticsearch logs. The configuration default rotates logs every 64MB, able to consume up to 2GB of space on disk.
Hopefully, the information above was of great use in the configuration process. Guides like this one are created to pass on knowledge to the avid learner who seeks a greater understanding. With them, a reader can discover new capabilities and maybe even encounter something entirely unknown to them. The preceding walkthrough is littered with valuable links and documentation to allow the reader to master their subject. To get the most from this resource, explore them all to find out what truths they reveal.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.Get Started