How to Backup an Elasticsearch Cluster with Snapshot

Introduction

Snapshots are an ideal way to backup your Elasticsearch data before performing system upgrades or a potentially breaking change to the cluster. A snapshot is essentially a backup of an Elasticsearch cluster and all its indices, taken from a running Elasticsearch cluster. This function allows you to backup the entire cluster or just a selection of indices when taking a snapshot. Because snapshots are taken incrementally, Elasticsearch will avoid copying data that is already stored as part of an earlier snapshot of the same index.

Prerequisites

Make sure the Elasticsearch service is installed and its cluster running properly before you try to backup an Elasticsearch cluster with Snapshot.

If you are planning to use Elastic’s Curator Index Management Library, you may also need to install other dependencies on your server.

Some of the Curator’s dependencies include: Python Version 2 or 3 The voluptuous Python data validation library PyYAML The Elasticsearch Python module * The Elasticsearch-Curator Python module

There are also other dependencies, however, the basic requirements should be satisfied once you have the Curator module installed. You can refer to the Curator installation page for a complete list of dependencies. NOTE: Elasticsearch Version 6.x or newer requires a Curator library version of 5.0 or later.

How to avoid the 406 Content-Type HTTP error

  • Starting with Elastic Stack version 6.0, all cURL requests that pass something in the content body curly braces must have the content type explicitly declared the in the header.
  • Failing to declare the content type will result in a 406 Content-Type error. To avoid this, make sure you include -H 'Content-Type: application/json' in the request header.

An example of a 406 Content-Type header error after making a cURL request to Elasticsearch

A screenshot of a 406 Content Type Header Error after making a cURL request to an Elasticsearch cluster Refer to the curl --help command to obtain more information.

Using the _Snapshot API to Backup an Elasticsearch Cluster

First, you will need to create a repository for the backup and separate repository for the older indices that will remain in “cold” storage.

Creating the repository:

If you haven’t already done so, create a directory for the backup as follows:

sudo mkdir ~/path/to/snapshot/location/
# the backup file will go here

Execute a cURL request to create a _snapshot, making sure you put the correct path in the ""location"" field:

curl -X PUT ""localhost:9200/_snapshot/backup_of_cluster"" -H 'Content-Type: application/json' -d'
{
""type"": ""fs"",
""settings"": {
""location"": ""~/path/to/snapshot/location/"",
""compress"": true # this field is optional
}
}
'

NOTE: In this example, the ""fs"" type is an abbreviation for a shared ""file system"" repository type. In order to share the file system, the backup must be accessible and have the same location as all nodes. You can use the optional ""compress"" setting to save space when making the backup.

You can also use the header option wait_for_completion=true for executing cURL commands.

How to gaining access to the backup:

Just perform a new cURL request to get the backup:

curl -X GET ""localhost:9200/_snapshot/backup_of_cluster""

How to restore an Elasticsearch cluster from a backup

You can use the _restore API to restore a cluster, and its indices, from a snapshot backup with the follwing command:

curl -X POST ""localhost:9200/_snapshot/backup_of_cluster/snapshot_name/_restore""

Create an Elasticsearch Snapshot Using the Curator Library

  • If you are familiar with Python, and its low-level client library, you can use the Curator module to create a snapshot. This is a simple tool that not only helps you manage your Elasticsearch indexes, it allows you to manage you backups and snapshots.
  • It is highly recommended you use Python 3, if possible, as Python 2 is depreciating by 2020.

Installing Elastic’s Curator library for Python:

The simplest way to install a module is with Python’s built-in PIP package manager.

  • Use the following install commands to install packages for PIP:
# for Python 3
pip3 install elasticsearch-curator

# for Python 2
pip install elasticsearch-curator

NOTE: Make sure that the Python PIP library is installed and working. You can install it in an Ubuntu terminal with sudo apt install python3-pip or sudo apt install python-pip. To update PIP use sudo python3 -m pip install --upgrade pip.

If the PIP installation should fail, you can also install the library from source. Alternatively, try installing a different version, such as pip3 install -U elasticsearch-curator==5.5.4, or try using the --ignore-installed option at the end of the command.

Download the TAR archive for Curator:

Change your current directory to the location where you want to download the archive, e.g. cd ~/Downloads, and then use the wget command to download the repo by executing the curl command in a macOS terminal. You can also visit the github URL directly to download the archive: `bash

replace 5.x.x with your version of choice

wget https://github.com/elastic/curator/archive/v5.7.3.tar.gz -O elasticsearch-curator.tar.gz sudo tar zxf package.tar.gz cd package-#.#.# sudo python setup.py install * The newest version of Curator is v5.7, as of April 2019.
Use the following command in the YUM package manager to install Curator on a Red Hat Linux distro:
bash yum install elasticsearch-curator `

Verify that curator installed properly:

Using Python’s IDLE, or in a built-in Python interpreter in the terminal or command prompt, excute the following command to try to import the library and confirm the module installed correctly: `python import curator ` * Provided you didn’t receive an ImportError warning, or any other type of error message, you should be good to go!

The curator command should now be available in the terminal:

curator --help

Screenshot of calling curator --help command in a terminal

How to use Python and Curator to create an Elasticsearch snapshot of a cluster:

You can use the Curator module to backup an Elasticsearch cluster.

You first need to create the YAML file to facilitate the Curator actions:

While Curator is Python-based and Elasticsearch is Java-based, you cannot pass none, ""none"", or None as a field’s value if there is no option for it. Instead, just leave it blank as passing None as a field’s value will be interpreted by Curator as a string value of ""None"". However, Curator will create a generic name with a date/time stamp if you leave the name field blank.

Each number and colon, such as 2:, represents an action the Curator module will read and execute. For example:

actions:
1
:
action
: snapshot
description
: ""A simple action to backup an Elasticsearch cluster with Curator""
options
:
repository
: ""backup_of_cluster""
continue_if_exception
: False
wait_for_completion
: True
filters
:
- filtertype
: pattern
kind
: prefix
value:

Having Curator execute the actions:

Save the YAML file, using the .yml file extension. Then execute the actions by using the curator command followed by the filename, like this:

curator my_curator_snapshot.yml

Conclusion

This tutorial showed you how to backup an Elasticsearch cluster with Snapshot. This can be especially helpful before you perform any upgrades or make any changes to a cluster. Bear in mind that snapshots are taken incrementally to prevent copying data already stored from an earlier snapshot of the same index. This makes it highly efficient, and convenient, to take frequent snapshots of your cluster.

You can backup an Elasticsearch cluster with Snapshot using cURL commands or with Elastic’s Curator module. However, remember that the Curator has some dependencies, including requiring specific versions of Python and Elasticsearch and their respective modules.

Pilot the ObjectRocket platform free for 30 Days

It's easy to get started. Imagine the time you'll save by not worrying about database management. Let's do this!

PILOT FREE FOR 30 DAYS

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.