How to Define a Python Function that will make a String Conform to Elasticsearch Naming Conventions
Introduction
This tutorial will explain the strict rules for building a Python function string in Elasticsearch for naming conventions. The article will explain how to build a Python function that will modify a string so that it conforms to these rules. One of the major rules that should not be broken is to keep the length of an index name to 255 characters or less.
Building a Python function string in Elasticsearch for naming conventions for Elasticsearch indices
The index name must conform to the following criteria:
- The name must be a lowercase string
- Index name must be fewer than 255 characters
- The name cannot merely be periods, such as
.
or..
- Colons (
:
) have been deprecated since Elasticsearch v7.0 and should not be used in the name. - The name must not contain the symbols
`,
/,
*,
?,
“,
<,
>,
|,
#` or spaces. - The index name must not start with
-
,_
or+
The code used in this tutorial will use Python to make an index name conform to the above criteria by passing the name as a string to a function call, resulting in it returning a valid string.
The final index name result returned by the function will then be tested by making an API call to Elasticsearch using the Python client’s indices.create()
method.
How to Create an Elasticsearch Index with an Invalid Name in Kibana
For demonstration purposes, the following is an example of an invalid HTTP request executed in Kibana to create an Elasticsearch index:
1 2 3 4 5 6 7 | PUT invalid?index*name { "settings" : { "number_of_shards" : 1, "number_of_replicas" : 1 } } |
The results should resemble the following:
How to Import Elasticsearch Libraries into a Python Script
Execute the following code to create a new Python script with the .py
file extension to import the libraries needed to create the Elasticsearch indices and to check for elasticsearch.exceptions
error responses:
1 2 3 4 5 6 7 8 | # import the random integer method from random import randint # import the Elasticsearch low-level client library from elasticsearch import Elasticsearch # import all of the Elasticsearch exceptions from elasticsearch.exceptions import * |
Import Python’s randint()
method call so the function will be able to randomly generate an index name with random integers at the end, e.g. index_46036999
, provided no string is passed to the function call.
How to create a client instance of the Elasticsearch low-level client
Instantiate a client object for the indices.create()
API calls with the following script:
1 2 | # domain name, or server's IP address, goes in the 'hosts' list client = Elasticsearch(hosts=["localhost:9200"]) |
How to Handle Request Errors Returned by Elasticsearch when Creating Indices in Python
The Elasticsearch cluster will return a RequestError
object response if the index name is not valid. Execute the following script to use a try-catch indentation block to catch any API exceptions:
1 2 3 4 5 6 7 8 9 10 11 | try: # return a response from the Elasticsearch cluster resp = client.indices.create(index=' some-new-index') # print the cluster response to the API call print ("indices.create() response:", resp) except RequestError as elastic_error: print ("indices.create() RequestError:", elastic_error) # elasticsearch.exceptions.RequestError print (type(elastic_error)) |
The result of the error handling should print the following information about the RequestError
exception:
1 2 | indices.create() RequestError: RequestError(400, 'invalid_index_name_exception', 'Invalid index name [ some-new-index], must not contain the following characters [ , ", *, \, <, |, ,, >, /, ?]') <class 'elasticsearch.exceptions.requesterror'="'elasticsearch.exceptions.RequestError'"> |
How to Avoid Getting an ‘Invalid Escape Sequence’ HTTP Response from Elasticsearch
While it is technically possible to create Elasticsearch index names with a group of special characters, such as "@"
or "%"
, it will create the risk of receiving an "invalid escape sequence"
error response.
To escape the sequence, the invalid character in the HTTP request must be repeated, or entered twice, such as using %%
instead of %
. Because this can be tricky to execute, it is best to avoid using names with a lot of special characters.
How to deal with the “invalid escape sequence” HTTP error response returned by Kibana
Below is an example of the code and the results in the subsequent screenshot:
1 | "reason": "invalid escape sequence `%_^' at index 11 of: i~m_!not_$a%_^valid&_index(_)name_%@^$$_34_hjhj" |
How to Define the Function for the Elasticsearch Index Names
The following script will define a function that will make a Python string conform to the Elasticsearch index naming conventions defined at the beginning of this tutorial:
1 2 3 4 5 | # define the function to fix the Elasticsearch index name def fix_index_name(name=""): # name must be string or integer if type(name) != int and type(name) != str: name = str(name) |
How to randomly generate an Elasticsearch index name if the length of the passed string is zero
Execute the following script:
1 2 3 4 5 6 7 8 9 10 | # randomly generate index name if needed if name == ".." or name == "." or len(name) == 0: # declare empty string for random integers ran_integers = "" for num in range(8): ran_integers += str(randint(0, 9)) # concatenate the str integers name = "index_" + ran_integers |
Execute the following script to ensure the name is in lowercase and replace spaces and linebreaks
1 2 3 4 5 6 | # make lowercase name = name.lower() # replace spaces and linebreaks with underscores (_) name = name.replace(" ", "_") name = name.replace("n", "_") |
How to declare a list of invalid Elasticsearch index name characters
Execute the following script to instantiate a Python list, inside brackets[]
, that will contain all of the invalid characters:
1 2 3 4 5 6 | # cannot be ``, `/`, `*`, `?`, `"`, `<`, `>`, `|`, spaces, and `#` not_valid = ["", "/", "*", "?", '"', "<", ">"] not_valid += ["|", " ", "#"] # avoid "invalid escape sequence" errors caused by "%" and "@" not_valid += ["%", "@"] |
How to ensure the index name begins correctly
Elasticsearch does not allow certain characters at the beginning of the index name. As shown in the following example, use [1:]
to remove the first character if needed:
1 2 3 4 | # index name cannot start with these characters while name[0] == "-" or name[0] == "_" or name[0] == "+": # remove the first char if this is the case name = name[1:] |
How to iterate over the Elasticsearch index characters
Use Python’s enumerate()
function to iterate over the index name’s characters and check to make sure that each character is valid. If valid, allow Python to append the character to the new string that will be returned:
1 2 3 4 5 6 7 8 9 10 11 12 | # remove all special characters new_name = "" for num, char in enumerate(name): # cannot contain double quotes (") if char != '"': # check if the character is in not_valid list valid = char not in not_valid # append the char to string if valid if valid == True: new_name += char |
How to restrict the length of the Elasticsearch index name to 255 characters
Execute the following two codes to ensure the index name is not more than 255 characters and then return the final string:
1 2 3 4 5 6 | # cut off the length of the index name if needed if len(new_name) > 255: new_name = new_name[:255] # return the fixed string return new_name |
How to Pass a string to the fix_index_name() method call
Following are examples of invalid Elasticsearch index names and the solutions:
1 2 3 4 5 6 | # invalid Elasticsearch index name index_name = " _I'M !NOT# A ^VALID& *INDEX(NAME) %@^$*$ 34 HJHJ!" print ("INVALID name:", index_name) # call the fix_index_name() function index_name = fix_index_name(index_name) |
The string index name is too long in the next example. The function will reduce its length from 4400 characters to the 255 maximum allowable limit:
1 2 3 4 5 6 7 8 | # index name that's too long index_name = "VERY VERY LONG STRING "*200 print ("index_name length:", len(index_name)) # call the fix_index_name() function index_name = fix_index_name(index_name) print ("VALID name:", index_name, "n") create_index(index_name) |
How to Create an Elasticsearch Index using the Name Returned by the Python Function
Confirm that the low-level Elasticsearch client for Python has been installed using the PIP package manager with the following command:
1 | pip3 install elasticsearch |
How to run the Python script to create the Elasticsearch indices
Execute the python3
command in a terminal or command-prompt
window and then effect the Python script. The Elasticsearch cluster should return the following dict
response:
1 | indices.create() response: {'acknowledged': True, 'shards_acknowledged': True, 'index': "i'm_!not_a_^valid&_index(name)_^$$_34_hjhj!"} |
How to use the Kibana Console UI to verify the Python strings were accepted by Elasticsearch
Execute the following GET
request in Kibana to verify the indices were created:
Conclusion
This tutorial explained how to build a Python function string in Elasticsearch for naming conventions so that it will conform to the rules for creating a valid index name for Elasticsearch indices. Specifically, the tutorial covered how to create an Elasticsearch index with an invalid name and how to correct errors. The ridiculous index-name examples used in this tutorial were to demonstrate that the Python function works as intended. Even if such index names are accepted by Elasticsearch, these names should not be used as it creates the very real possibility of errors. Instead, remember that Elasticsearch index names should be as short, simple, concise and understandable as possible.
Just the Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | #!/usr/bin/env python3 #-*- coding: utf-8 -*- # import the random integer method from random import randint # import the Elasticsearch low-level client library from elasticsearch import Elasticsearch # import all of the Elasticsearch exceptions from elasticsearch.exceptions import * # domain name, or server's IP address, goes in the 'hosts' list client = Elasticsearch(hosts=["localhost:9200"]) # define a function for creating an Elasticsearch index in Python def create_index(self): name = str(self) try: # return a response from the Elasticsearch cluster resp = client.indices.create(index=name) # print the cluster response to the API call print ("indices.create() response:", resp) print ("index created:", resp["acknowledged"], "n") except RequestError as elastic_error: print ("indices.create() RequestError:", elastic_error) # elasticsearch.exceptions.RequestError print (type(elastic_error)) # define the function to fix the Elasticsearch index name def fix_index_name(name=""): # name must be string or integer if type(name) != int and type(name) != str: name = str(name) # randomly generate index name if needed if name == ".." or name == "." or len(name) == 0: # declare empty string for random integers ran_integers = "" for num in range(8): ran_integers += str(randint(0, 9)) # concatenate the str integers name = "index_" + ran_integers # make lowercase name = name.lower() # replace spaces and linebreaks with underscores (_) name = name.replace(" ", "_") name = name.replace("n", "_") # cannot be ``, `/`, `*`, `?`, `"`, `<`, `>`, `|`, spaces, and `#` not_valid = ["", "/", "*", "?", '"', "<", ">"] not_valid += ["|", " ", "#"] # avoid "invalid escape sequence" errors caused by "%" and "@" not_valid += ["%", "@"] # index name cannot start with these characters while name[0] == "-" or name[0] == "_" or name[0] == "+": # remove the first char if this is the case name = name[1:] # remove all special characters new_name = "" for num, char in enumerate(name): # cannot contain double quotes (") if char != '"': # check if the character is in not_valid list valid = char not in not_valid # append the char to string if valid if valid == True: new_name += char # cut off the length of the index name if needed if len(new_name) > 255: new_name = new_name[:255] # return the fixed string return new_name # invalid Elasticsearch index name index_name = " _I'M !NOT# A ^VALID& *INDEX(NAME) %@^$*$ 34 HJHJ!" print ("INVALID name:", index_name) # call the fix_index_name() function index_name = fix_index_name(index_name) print ("VALID name:", index_name, "n") create_index(index_name) # index name that's too long index_name = "VERY VERY LONG STRING "*200 print ("index_name length:", len(index_name)) # call the fix_index_name() function index_name = fix_index_name(index_name) print ("VALID name:", index_name, "n") create_index(index_name) |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started