How to install and use Curator Python in Elasticsearch

Written by Data Pilot

April 15, 2019

Curator
Elasticsearch

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

If you have indices stored in Elasticsearch, Python’s Curator module is a tool you will love. Just as a museum curator manages the displays, Curator helps you manage all of your indices. Curator Python for Elasticsearch allows you to create your own scripts to perform various tasks. The Elasticsearch Curator Python API supports Python versions 2.7 and later and is currently compatible with the 5.x Elasticsearch versions.

Elastic’s official Github repository the Curator module

Prerequisites

While the Python code will function on Windows machines, this article will explain how to install and use the Curator module on UNIX-based machines, such as a macOS or Linux.
You are advised to use Python 3 as Python 2 is depreciating and will be obsolete by 2020. A listing of all Python 3 dependencies for the curator library are listed in the repository’s Debian page.
Be sure you have the most recent stable release of Python 3 on your machine.
The commands for python3 and the pip3 package manager each follow the following format (ending with a 3) :

1 2	python3 some_script.py pip3 install examplemodule

Python 3 and Pip3 Package Manager Versions

The Elastic Stack must be installed and running properly on your machine for you to be able to import data to your target cluster. You will need to have an index created on your cluster to place the Python data into.
Use a private key to establish SSH access to your server.

Install the Curator Module

Use the Python 3’s pip3 package manager to install Curator:

1	pip3 install elasticsearch-curator

Install Curator with pip3 If Curator is already installed on you machine, employ PIP’s freeze command to obtain the currently-installed version:

1	pip3 freeze \| grep curator

It should return something similar to this:

1	elasticsearch-curator==5.6.0

Once Curator is installed, use the --help option to view a list of all the commands:

1	curator --help

Configure Curator

The default file and directory for the configuration file is:

1	~/.curator/curator.yml

The . in front of the directory denotes that directory is hidden. If there is no hidden directory in your home (~) directory for Curator, (use ls -a to list hidden files and folders in a directory), and you will then be able to create the directory and the YAML configuration file yourself:

1
2
3

sudo mkdir ~/.curator/
cd .curator
sudo touch curator.yml

The most essential curator.yml fields to configure in the file are the "hosts:" and "port:" fields. Make sure Elasticsearch is running on the correct domain and port. You can use a text editor, like nano, to edit the file:

1	sudo nano curator.yml

The default configuration file is as follows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

---
# Leave a key empty if there isn’t a value given. None will be a string,
# not a Python "NoneType"
client:
hosts:
- 127.0.0.1
port: 9200
url_prefix:
use_ssl: False
certificate:
client_cert:
client_key:
ssl_no_validate: False
http_auth:
timeout: 30
master_only: False

logging:
loglevel: INFO
logfile:
logformat: default
blacklist: ['elasticsearch', 'urllib3']

The blacklist: field should be one of the following three:
An empty Python array ([])
An array of log-handler strings
A black space

Refer to Elastic’s documentation for the Curator’s configuration file for more information.

Perform YAML File Actions

Generate a YAML file called test.yml and edit it with nano:

1 2	sudo touch test.yml sudo nano test.yml

Contents of test.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

---
actions:
1:
action: delete_indices
description: >-
Find indexes older than 15 days that match a filter pattern
options:
ignore_empty_list: True
timeout_override:
continue_if_exception: False
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: elasticsearch-
exclude:
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 15
exclude:

Save the file and execute the action with the fgollowing command:

1	curator --dry-run test.yml

Create a new Python script

Create a Python script in your terminal with the following command:

1	sudo touch elastic_curator.py

Edit the Python Script

Use a terminal editor, like nano, to edit the new script:

1	sudo nano elastic_curator.py

When finished, save the file and exit nano by pressing: CTRL + O CTRL + X
Import the Curator modules:

1 2	import elasticsearch import curator

Generate a new client instance of Elasticsearch:

1	elastic = elasticsearch.Elasticsearch()

Create an new instance of an index list using the IndexList class:

1	index_list = curator.IndexList(elastic)

Note that this class has different methods for filtering data for each type of available filter.

Filter Examples

The documentation provides complete list of all the filter methods.
Filter by age and regex keyword parameters:

1
2
3
4

index_list.filter_by_regex(kind='prefix', value='logstash-')
index_list.filter_by_age(source='name', direction='older', timestring='%Y.%m.%d', unit='days', unit_count=20)
delete_indices = curator.DeleteIndices(index_list)
delete_indices.do_action()

A unit_count is a value you can pass that will calculate the number of units relative to the number of seconds.

Filter by age

1
2
3

index_list.filter_by_age(source='name', direction='older', timestring='%Y.%m.%d',
unit='days', unit_count=5
)

Conclusion

In this tutorial you learned how to install, configure and use the Curator module on UNIX-based machines. As Curator is written in Python, it will work with almost all operating systems. Be aware, Curator only requires access to a client node in the Elasticsearch cluster to function, and does not have to be directly installed on the nodes in the cluster. Remember to use Python 3, as you may encounter compatibility issues with Python 2 and outdated libraries.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started