How To Perform Rolling Upgrades To An Elasticsearch Cluster

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction:

The benefit of a rolling upgrade is that it doesn’t interrupt the Elasticsearch service. In essence, you shut down a node, upgrade it, and then restart it. You’ll repeat the process until all the nodes are updated. When a node is being updated, shards are sent to other nodes. Everything runs smoothly without having to stop the Elasticsearch service.

A full cluster restart upgrade is the opposite of a rolling upgrade, mainly because a full cluster restart upgrade is accomplished all at once. The Elasticsearch service is shut down in that case. You’ll need to do that when you upgrade to a higher version of Elastic Stack. If a cluster has more than one Elasticsearch version, an upgraded index’s shards are unable to be replicated. It’s important to disable shard replication in a full cluster restart upgrade as well, just so you know.

That’s the difference between rolling upgrades and a full cluster restart upgrade. This step-by step tutorial explains how to perform an Elasticsearch cluster rolling upgrade, the one that requires shutting down one node at a time.

Prerequisites

  • It’s always a good practice to back up everything before you upgrade an Elasticsearch cluster. Before the upgrade, take an image of the cluster just in case a roll back calls for it. That’s because roll backs for a few breaking changes will likely be needed.

  • Check for compatibility among Elastic products. The versions (for example 7.x and so on) should be the same.

  • Do a full cluster restart If you need to upgrade from 5.5 or any other major version. Otherwise, all you’ll need is a rolling upgrade.

NOTE: Reindexing all indexes are required to update Elasticsearch clusters 5.6 and earlier versions.

Get ready for a rolling upgrade

Backup the cluster data

  • The most important step to do before you upgrade. As discussed in the Prerequisite section, if you hadn’t already, back up your data. One error such as a node fail during the upgrade can cause a loss of irreplaceable data. Use a command-line interface tool to export the data in Elasticsearch in CSV format.

Verify the status of the cluster

  • Make sure the cluster is robust and ready for the upgrade. Perform a cURL GET _cluster health request.
1
2
3
4
5
# get health of an Elasticsearch cluster
curl -X GET 'localhost:9200/_cluster/health?pretty'

# use the cURL request to get the index health as well:
curl -X GET "localhost:9200/_cluster/health/some_index,another_index"
  • A "yellow" value returned of the JSON object is a sign of unhealthy status value. To decrease the chance of losing data, check that your shard replicas are allocated. If you have to, modify the cluster or create more clusters so you have the number of shards and replicas you need.

The unhealthy "yellow" "status" turns to a "green" healthy "status":

Screenshot of a cURL request to get cluster health before and after optimization

Here’s another way to check the status of a cluster (_cluster/state):

1
curl -XGET 'http://localhost:9200/_cluster/state?pretty'

Disable allocation of replica shards:

  • Keep nodes from automatically trying to adjust, make "primaries" the "cluster.routing.allocation.enable" option.

  • Route just primary nodes. To do this, use the terminal window or command line and create an HTTP request.

1
2
3
4
5
6
7
# Kibana Console
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "primaries"
}
}
  • Alternatively, here’s how to do this is in curl:
1
2
3
4
5
6
7
curl -X PUT "https://{DOMAIN_NAME}:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": "none"
}
}
'

Conduct a synced flush by doing a POST request

  • To streamline the process, in cURL or Kibana, perform a sync-flush, then restart the cluster. Although this step is not mandatory, you’ll expedite the recovery of the shards if you do POSTsync-flush.
1
POST _flush/synced
  • This is how it’s done in cURL:
1
curl -X POST "https://{DOMAIN_NAME}/_flush/synced/?pretty"

Sometimes different processes occurring at the same time can cause the sync-flush to fail the first time. It’s not uncommon to have to repeat the request.

A synced flush is performed using the POST request in cURL: A screenshot in the terminal of a command to initiate a sync flush of an Elasticsearch cluster using a POST HTTP request with the "?pretty" option

Begin the Rolling Upgrades (per node)

  • Select one node from a cluster in Elasticsearch.

  • Locate the service management framework of the server. You’ll need this information so you can shut down the node with the command that will work. If you’re using Linux, try “SysV Init” or “systemd”. Either one of those usually works.

Screenshot using the `ps -p 1` command in a terminal window to check the service management framework on macOS and Ubuntu

The best way to find out which one will work for you is, from the terminal command line, use ps -p 1.

How to shut a node down

  • It’s simple when the cluster is configured to have on a machine or server just one node. In that case, all you need to use to shut down an instance of Elasticsearch service management of the server or the Java service.

  • Here’s how you do it. Go to the systemd service management system framework. On a command line on a server there, use stop.

1
2
3
sudo systemctl stop elasticsearch.service
# ..or
sudo -i service elasticsearch stop
  • Use grep for background daemon Elasticsearch, if it’s running that way. You’ll need to find the PID first. Next, use kill-9.
1
2
3
4
5
6
7
8
9
10
11
12
# grep the JVM Process Status for the Elastic PID
sudo service --status-all | grep 'elastic'

# ..or return cat info on the PID:
cat /tmp/elasticsearch-pid && echo

# ..or grep the JVM Process Status for the Elastic PID
jps | grep Elasticsearch

# kill the service
# replace {PID} with the service's PID number
sudo kill -9 {PID} # e.g. 12345

Elasticsearch node is ready. Upgrade it.

Download the latest installation version:

  • Download Elasticsearch – Check that your archive is compatible with other Elastic products. If you’ve already completed this step, great.

  • Linux is a little different. All you have to do to download the package is use the wget command and then the URL to the archive. This will download it to the (pwd) current working directory.

  • This will be just an upgrade that is minor yet works properly with the rolling upgrade you’re making.

  • Get the hostnamect1 information if you need to find out the Linux distribution on your server. To do this, connect remotely to your server through a terminal window and use the hostnamect1 command.

1
hostnamectl

The hostnamect command lets you know which Linux distribution you have installed. Screenshot using the 'hostnamectl' command to get the distro of Linux Older versions are available if you require one other than the 7.x version made available in April 2019. Go to Elastic’s Past Releases to download the version you need.

Time to upgrade Elasticsearch plugins for the node

Plugin upgrading adds the finishing touch. You’ll want everything to be compatible and leave no room for error. Take time to complete this step to increase the chances of a successful upgrade.

Note that Elasticsearch plugins must be upgraded when upgrading a node.

  • Plugin upgrading is easy to do if you use the script elasticsearch-plugin from a terminal in Linux to return a plugin list.
1
sudo bin/elasticsearch-plugin list

In cURL, do a _nodes request to return a list of nodes and corresponding plugins from the cluster in Elasticsearch.

1
2
# replace {SERVER_DOMAIN} with the IP address or domain name
sudo curl -X GET "http://{SERVER_DOMAIN}:9200/_nodes/plugins?pretty"

Using cURL from remote access to a server in Linux. The GET request _nodes/plugins?pretty returns a nodes and plugins list.

Screenshot of a terminal with remote access to a Linux server getting a list of an Elasticsearch node's plugins

Delete or upgrade plugins in Elasticsearch

  • Remove any plugins you don’t want. Upgrade the plugins that remain. The script elasticsearch-plugin along with the command install updates the plugin. Here’s the official Elastic documentation for plugin updating and removing.

The quick steps are listed in the example here.

1
2
3
4
5
# replace {PLUGIN_NAME} with the actual plugin name
sudo bin/elasticsearch-plugin install {PLUGIN_NAME}

# use the 'remove' command to uninstall the plugin
sudo bin/elasticsearch-plugin remove {PLUGIN_NAME}

The Elasticsearch upgraded node is ready. Restart it.

  • Great job! You’ve completed the installation and have updated the plugins. Restarting the upgraded node is the next step.

Restart Elasticsearch:

Use the service management system framework to restart Elasticsearch.

1
2
3
sudo systemctl start elasticsearch.service
# ..or
sudo -i service elasticsearch start

Verify that the node has recovered completely

  • Check the recovery of the node by making a GET request in cURL.
1
2
# replace {DOMAIN_NAME} with IP address or domain name
sudo curl -X GET "https://{DOMAIN_NAME}:9200/_cat/nodes/"
  • If everything is working okay for the node, you’ll see the IP address of the server in the HTTP response. If something is wrong, you’ll see a “Failed to connect to {DOMAIN_NAME} port 9200:” response.
1
2
Failed to connect to {DOMAIN_NAME} port 9200:
Connection refused

Disable cluster attempts to restore shard allocation

Make the cluster.routing.allocation.enable settings value of null to stop the cluster from continually trying to shard allocate.

1
2
3
4
5
6
7
sudo curl -X PUT "https://{DOMAIN_NAME}:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": null
}
}
'

Wait three minutes, then look at the nodes health to verify the status and find out if the node has recovered:

1
2
3
4
5
# get the node's health
sudo curl -XGET 'https://{DOMAIN_NAME}:9200/_cat/health?pretty'

# get the node's recovery status
sudo curl -XGET 'https://{DOMAIN_NAME}:9200/_cat/recovery?pretty'

The cluster health and node is checked after starting Elasticsearch on a Linux server’s terminal.

Screenshot of a Linux terminal starting the Elasticsearch service and checking the node and cluster health

Be patient. The node might take a little longer to recover.

Sometimes, nodes take more than a few minutes to recover. Check the node every five minutes a few times if you get a “Fail to connect” response. The POSTsync-flush command you completed earlier should make a difference right now. It was recommended for this purpose.

WARNING: It’s important to wait until the current node you are working on has recovered completely before you move on the next node. The cluster must be stable in between each node upgrade. Set aside enough time to complete each node, because after you start the process, you’ll want to finish all of a cluster’s nodes in succession.

Conclusion

In this tutorial, you learned how to complete an Elasticsearch cluster rolling upgrade, node by node. Rolling upgrades enable you to keep Elasticsearch running without disrupting the service. In this way, we can say that the method doesn’t affect productivity. This is different from a full cluster restart upgrade, which requires the service to be totally shut down during an upgrade. Major version upgrades require a full cluster restart type of upgrade.

Some important facts to keep in mind:

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.