How to Define a Python Function that will make a String Conform to Elasticsearch Naming Conventions

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

This tutorial will explain the strict rules for building a Python function string in Elasticsearch for naming conventions. The article will explain how to build a Python function that will modify a string so that it conforms to these rules. One of the major rules that should not be broken is to keep the length of an index name to 255 characters or less.

Building a Python function string in Elasticsearch for naming conventions for Elasticsearch indices

The index name must conform to the following criteria:

  • The name must be a lowercase string
  • Index name must be fewer than 255 characters
  • The name cannot merely be periods, such as . or ..
  • Colons (:) have been deprecated since Elasticsearch v7.0 and should not be used in the name.
  • The name must not contain the symbols `,/,*,?,,<,>,|,#` or spaces.
  • The index name must not start with -, _ or +

The code used in this tutorial will use Python to make an index name conform to the above criteria by passing the name as a string to a function call, resulting in it returning a valid string.

The final index name result returned by the function will then be tested by making an API call to Elasticsearch using the Python client’s indices.create() method.

How to Create an Elasticsearch Index with an Invalid Name in Kibana

For demonstration purposes, the following is an example of an invalid HTTP request executed in Kibana to create an Elasticsearch index:

1
2
3
4
5
6
7
PUT invalid?index*name
{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}

The results should resemble the following:

Screenshot of an invalid Elasticsearch index name in the Kibana Console UI

How to Import Elasticsearch Libraries into a Python Script

Execute the following code to create a new Python script with the .py file extension to import the libraries needed to create the Elasticsearch indices and to check for elasticsearch.exceptions error responses:

1
2
3
4
5
6
7
8
# import the random integer method
from random import randint

# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch

# import all of the Elasticsearch exceptions
from elasticsearch.exceptions import *

Import Python’s randint() method call so the function will be able to randomly generate an index name with random integers at the end, e.g. index_46036999, provided no string is passed to the function call.

How to create a client instance of the Elasticsearch low-level client

Instantiate a client object for the indices.create() API calls with the following script:

1
2
# domain name, or server's IP address, goes in the 'hosts' list
client = Elasticsearch(hosts=["localhost:9200"])

How to Handle Request Errors Returned by Elasticsearch when Creating Indices in Python

The Elasticsearch cluster will return a RequestError object response if the index name is not valid. Execute the following script to use a try-catch indentation block to catch any API exceptions:

1
2
3
4
5
6
7
8
9
10
11
try:
# return a response from the Elasticsearch cluster
resp = client.indices.create(index=' some-new-index')

# print the cluster response to the API call
print ("indices.create() response:", resp)
except RequestError as elastic_error:
print ("indices.create() RequestError:", elastic_error)

# elasticsearch.exceptions.RequestError
print (type(elastic_error))

The result of the error handling should print the following information about the RequestError exception:

1
2
indices.create() RequestError: RequestError(400, 'invalid_index_name_exception', 'Invalid index name [ some-new-index], must not contain the following characters [ , ", *, \, <, |, ,, >, /, ?]')
<class 'elasticsearch.exceptions.requesterror'="'elasticsearch.exceptions.RequestError'">

How to Avoid Getting an ‘Invalid Escape Sequence’ HTTP Response from Elasticsearch

While it is technically possible to create Elasticsearch index names with a group of special characters, such as "@" or "%", it will create the risk of receiving an "invalid escape sequence" error response.

To escape the sequence, the invalid character in the HTTP request must be repeated, or entered twice, such as using %% instead of %. Because this can be tricky to execute, it is best to avoid using names with a lot of special characters.

How to deal with the “invalid escape sequence” HTTP error response returned by Kibana

Below is an example of the code and the results in the subsequent screenshot:

1
"reason": "invalid escape sequence `%_^' at index 11 of: i~m_!not_$a%_^valid&_index(_)name_%@^$$_34_hjhj"

Screenshot of Kibana returning an invalid escape sequence HTTP error

How to Define the Function for the Elasticsearch Index Names

The following script will define a function that will make a Python string conform to the Elasticsearch index naming conventions defined at the beginning of this tutorial:

1
2
3
4
5
# define the function to fix the Elasticsearch index name
def fix_index_name(name=""):
# name must be string or integer
if type(name) != int and type(name) != str:
name = str(name)

How to randomly generate an Elasticsearch index name if the length of the passed string is zero

Execute the following script:

1
2
3
4
5
6
7
8
9
10
# randomly generate index name if needed
if name == ".." or name == "." or len(name) == 0:

# declare empty string for random integers
ran_integers = ""
for num in range(8):
ran_integers += str(randint(0, 9))

# concatenate the str integers
name = "index_" + ran_integers

Execute the following script to ensure the name is in lowercase and replace spaces and linebreaks

1
2
3
4
5
6
# make lowercase
name = name.lower()

# replace spaces and linebreaks with underscores (_)
name = name.replace(" ", "_")
name = name.replace("n", "_")

How to declare a list of invalid Elasticsearch index name characters

Execute the following script to instantiate a Python list, inside brackets[], that will contain all of the invalid characters:

1
2
3
4
5
6
# cannot be ``, `/`, `*`, `?`, `"`, `<`, `>`, `|`, spaces, and `#`
not_valid = ["", "/", "*", "?", '"', "<", ">"]
not_valid += ["|", " ", "#"]

# avoid "invalid escape sequence" errors caused by "%" and "@"
not_valid += ["%", "@"]

How to ensure the index name begins correctly

Elasticsearch does not allow certain characters at the beginning of the index name. As shown in the following example, use [1:] to remove the first character if needed:

1
2
3
4
# index name cannot start with these characters
while name[0] == "-" or name[0] == "_" or name[0] == "+":
# remove the first char if this is the case
name = name[1:]

How to iterate over the Elasticsearch index characters

Use Python’s enumerate() function to iterate over the index name’s characters and check to make sure that each character is valid. If valid, allow Python to append the character to the new string that will be returned:

1
2
3
4
5
6
7
8
9
10
11
12
# remove all special characters
new_name = ""
for num, char in enumerate(name):

# cannot contain double quotes (")
if char != '"':
# check if the character is in not_valid list
valid = char not in not_valid

# append the char to string if valid
if valid == True:
new_name += char

How to restrict the length of the Elasticsearch index name to 255 characters

Execute the following two codes to ensure the index name is not more than 255 characters and then return the final string:

1
2
3
4
5
6
# cut off the length of the index name if needed
if len(new_name) > 255:
new_name = new_name[:255]

# return the fixed string
return new_name

How to Pass a string to the fix_index_name() method call

Following are examples of invalid Elasticsearch index names and the solutions:

1
2
3
4
5
6
# invalid Elasticsearch index name
index_name = " _I'M !NOT# A ^VALID& *INDEX(NAME) %@^$*$ 34 HJHJ!"
print ("INVALID name:", index_name)

# call the fix_index_name() function
index_name = fix_index_name(index_name)

The string index name is too long in the next example. The function will reduce its length from 4400 characters to the 255 maximum allowable limit:

1
2
3
4
5
6
7
8
# index name that's too long
index_name = "VERY VERY LONG STRING "*200
print ("index_name length:", len(index_name))

# call the fix_index_name() function
index_name = fix_index_name(index_name)
print ("VALID name:", index_name, "n")
create_index(index_name)

How to Create an Elasticsearch Index using the Name Returned by the Python Function

Confirm that the low-level Elasticsearch client for Python has been installed using the PIP package manager with the following command:

1
pip3 install elasticsearch

How to run the Python script to create the Elasticsearch indices

Execute the python3 command in a terminal or command-prompt window and then effect the Python script. The Elasticsearch cluster should return the following dict response:

1
indices.create() response: {'acknowledged': True, 'shards_acknowledged': True, 'index': "i'm_!not_a_^valid&_index(name)_^$$_34_hjhj!"}

How to use the Kibana Console UI to verify the Python strings were accepted by Elasticsearch

Execute the following GET request in Kibana to verify the indices were created:

Screenshot of Kibana getting an Elasticsearch index created in a Python script

Screenshot of Kibana getting a long Elasticsearch index name created in a Python script

Conclusion

This tutorial explained how to build a Python function string in Elasticsearch for naming conventions so that it will conform to the rules for creating a valid index name for Elasticsearch indices. Specifically, the tutorial covered how to create an Elasticsearch index with an invalid name and how to correct errors. The ridiculous index-name examples used in this tutorial were to demonstrate that the Python function works as intended. Even if such index names are accepted by Elasticsearch, these names should not be used as it creates the very real possibility of errors. Instead, remember that Elasticsearch index names should be as short, simple, concise and understandable as possible.

Just the Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the random integer method
from random import randint

# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch

# import all of the Elasticsearch exceptions
from elasticsearch.exceptions import *

# domain name, or server's IP address, goes in the 'hosts' list
client = Elasticsearch(hosts=["localhost:9200"])

# define a function for creating an Elasticsearch index in Python
def create_index(self):
name = str(self)
try:
# return a response from the Elasticsearch cluster
resp = client.indices.create(index=name)

# print the cluster response to the API call
print ("indices.create() response:", resp)
print ("index created:", resp["acknowledged"], "n")
except RequestError as elastic_error:
print ("indices.create() RequestError:", elastic_error)

# elasticsearch.exceptions.RequestError
print (type(elastic_error))

# define the function to fix the Elasticsearch index name
def fix_index_name(name=""):
# name must be string or integer
if type(name) != int and type(name) != str:
name = str(name)

# randomly generate index name if needed
if name == ".." or name == "." or len(name) == 0:

# declare empty string for random integers
ran_integers = ""
for num in range(8):
ran_integers += str(randint(0, 9))

# concatenate the str integers
name = "index_" + ran_integers

# make lowercase
name = name.lower()

# replace spaces and linebreaks with underscores (_)
name = name.replace(" ", "_")
name = name.replace("n", "_")

# cannot be ``, `/`, `*`, `?`, `"`, `<`, `>`, `|`, spaces, and `#`
not_valid = ["", "/", "*", "?", '"', "<", ">"]
not_valid += ["|", " ", "#"]

# avoid "invalid escape sequence" errors caused by "%" and "@"
not_valid += ["%", "@"]

# index name cannot start with these characters
while name[0] == "-" or name[0] == "_" or name[0] == "+":
# remove the first char if this is the case
name = name[1:]

# remove all special characters
new_name = ""
for num, char in enumerate(name):

# cannot contain double quotes (")
if char != '"':
# check if the character is in not_valid list
valid = char not in not_valid

# append the char to string if valid
if valid == True:
new_name += char

# cut off the length of the index name if needed
if len(new_name) > 255:
new_name = new_name[:255]

# return the fixed string
return new_name

# invalid Elasticsearch index name
index_name = " _I'M !NOT# A ^VALID& *INDEX(NAME) %@^$*$ 34 HJHJ!"
print ("INVALID name:", index_name)

# call the fix_index_name() function
index_name = fix_index_name(index_name)

print ("VALID name:", index_name, "n")
create_index(index_name)

# index name that's too long
index_name = "VERY VERY LONG STRING "*200
print ("index_name length:", len(index_name))

# call the fix_index_name() function
index_name = fix_index_name(index_name)
print ("VALID name:", index_name, "n")
create_index(index_name)

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.