How to Clean and Optimize Elasticsearch Indices

Introduction

If you want to keep your Elasticsearch deployment running at peak performance, it’s important to perform occasional maintenance on your indices. Fortunately, Elasticsearch provides some useful tools to handle these tasks. The Force Merge API can be used to optimize an Elasticsearch index, and the Refresh and Flush APIs can be used to clean indices. In this article, we’ll explain how to both clean and optimize Elasticsearch indices.

Manage the how bulk requests are made:

Although bulk requests are a far more efficient way to index documents than adding them one at a time, it’s important to consider both the number of documents being indexed and the size of the documents. For example, it may seem like indexing just a few hundred documents wouldn’t cause any problems, but what if the average size of those documents was 1GB or more? The bigger the request made to Elasticsearch, the less memory that’s available to handle other requests. Fortunately, it’s not difficult to find the “just right” bulk request size that doesn’t cause performance to drop off– try indexing documents batches that gradually increase in size until you reach the point where performance begins to degrade.

Use cURL to check on the health and status of an Elasticsearch cluster’s indices

You can use the _cat API to return a list of all the indices, and get the status and health of each index:

# get a list of all the indices:
curl http://localhost:9200/_aliases?pretty

# get the health and status of each index
curl -XGET 'localhost:9200/_cat/indices?v'

You can also use the _aliases API to get the full name and alias of all indices on an Elasticsearch cluster, as shown in the screenshot below:

Screenshot of a terminal making cURL request to an Elasticsearch cluster to get the names, aliases, status, and health of its indexes

Using the Force Merge API to optimize an Elasticsearch index

NOTE: The _optimize API has since been renamed to _forcemerge

One simple way to clean and optimize Elasticsearch indices is by using the Force Merge API. To understand how Force Merge operations work, it’s important to know a bit about the underlying architecture of Elasticsearch, which is built on Lucene. Documents are inserted into an index, which is mapped to one or more shards. Each shard is made up of segments, which can be thought of as mini-indices that handle searching on their particular part of a data collection. Force Merge keeps your Elasticsearch indices running at optimal performance by merging segments, which reduces the number of segments in a shard and minimizes redundant data.

You can make a POST cURL request to perform a force merge:

curl -XPOST 'http://localhost:9200/pets/_forcemerge'

You can also _forcemerge multiple indices in a single request: `bash curl -XPOST ‘localhost:9200/people1,people2/_forcemerge?pretty’ `

Force Merge API Parameters

There are several options you can use in your force merge request:

  • max_num_segments: This parameter defines the maximum number of segments after the merge is complete. The default setting is "1", so change it only if you don’t want the indices fully merged together.

  • only_expunge_deletes: When a document is deleted in Lucene, it’s only marked as deleted– it’s not really deleted from a segment. Setting this parameter to "true" allows you to only merge segments that contain deletes. The default setting is "false".

  • flush: This parameter is a boolean option with a default value of true, which will flush the index after a merge.

Use the Refresh API to keep Elasticsearch indices up to date

It can also be helpful to use the _refresh API to keep your indices up to date. This forces an explicit refresh of an index, ensuring that documents are available for search immediately after indexing. Like the Force Merge API, a refresh can be performed via a cURL:

curl -X POST "localhost:9200/my_index/_refresh"

Refresh calls are done automatically by Elasticsearch on a regular basis, but using the Refresh API is a good way to make sure you get the very latest version of an index before cleaning it out or making changes to it.

Clean out indices with the Flush API

Another useful API for keeping your indices in good shape is the Flush API. You can use the _flush API to free up memory from an index by flushing the data:

curl -X POST "localhost:9200/some_index/_flush"

Here’s a cURL example that flushes two indices using the force option: `bash curl -XPOST ‘localhost:9200/people1,people2/_flush?force’ `

Fine-tuning the Refresh Interval setting

You can change the refresh_interval setting to decide how often an index is refreshed. The number value represents how many seconds until the next refresh, and the default for this setting is 1, which means that a refresh is performed every second:

curl -XPUT 'localhost:9200/some_index/_settings' -H 'Content-Type: application/json' -d' {
"index" : {
"refresh_interval" : "2"
}
}'

Conclusion

Performing basic maintenance on your Elasticsearch indices takes a bit of extra time, but the improvements in performance make the additional effort worthwhile. Using the Force Merge API, Refresh API and Flush API can keep your Elasticsearch indices in peak working condition. With the step-by-step instructions provided in this article, you’ll have no trouble cleaning and optimizing your Elasticsearch indices by using these handy APIs.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.