Build an Elasticsearch Web Application in Python (Part 2)

Introduction to Elasticsearch web applications in Python

This article is part two of an article series that will show you how to build a Python web application designed to return Elasticsearch information. The first part of the series demonstrated how to create the web server using the Bottle framework. It also provided some example code that connected to Elasticsearch (using the Python low-level client) in order to return its cluster information.

This next part will show you how you can return all of the cluster’s indices (and their respective document data) to the front end of a website using the Bottle framework in Python.

Create an HTML file for the Elasticsearch document data

This article will use Python’s built-in codecs library to open the HTML file and retrieve its data for first part of the web page that will display the Elasticsearch document data.

Open a terminal window and use the code command for VB Code, subl for the Sublime editor, or a terminal-based text editor (like vim, nano, or gedit) to open or create an HTML file:

code index.html

Put the meta and header tags into the HTML file

Put the following code into the file to begin the HTML for the tables for the Elasticsearch document data:

<!DOCTYPE html>
<html lang="en">

<head>
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta charset="UTF-8">
   
    <!-- bootstrap 4 href -->
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">

    <style>
        h3 {
            padding: 0.5rem;
        }
        .table-responsive  {
            font-family: 'Trebuchet MS', 'Lucida Sans Unicode', 'Lucida Grande', 'Lucida Sans', Arial, sans-serif, Helvetica, sans-serif;
            border-collapse: collapse;
            padding: 0.5rem;
        }

        .table-responsive td, .table-responsive th {
            border: 0.2rem solid #27282a;
            padding: 0.5rem;
        }

        .table-responsive tr:nth-child(even) {
            background-color: #8a8c84;
        }

        .table-responsive td:hover {
            background-color:#5b5c5e;
            color: #b9b9b9;
        }

        .table-responsive th {
            padding: 1rem;
            text-align: center;
            background-color: #71747d;
            color: #ffffff;
        }
    </style>
   
</head>

<body>
    <div style="margin-left:0rem" class="container">
        <h1>ObjectRocket Tutorials</h1>            
        <h2>Elasticsearch Indices</h2>
    </div>

NOTE: The Bootstrap stylesheet link and the CSS within the <style> tag are optional, and just help to “prettify” the table data. The <meta charset="UTF-8"> meta tag is necessary if you’d like to display Elasticsearch data that includes Unicode characters outside of the 128-character ASCII range.

Create a Python script for the Elasticsearch web application

Create a Python script (e.g. elastic_app.py) that will query Elasticsearch documents, and run the Bottle server as well.

Import the Elasticsearch and Bottle libraries for Python

Like in part one of this series, you’ll first have to import the necessary Python libraries for the Python web server to return Elasticsearch documents:

# import the run and get methods from the bottle library
from bottle import run, get

# import the elasticsearch client library
from elasticsearch import Elasticsearch

# import Python's json library to format JSON responses
import json

# use codecs to open the existing HTML file
import codecs

In order to avoid an ImportError, you’ll have to make sure that you’ve installed the elasticsearch and bottle libraries, using the PIP package manager for Python, by executing the pip install (for Python 2.7), or pip3 install (for Python 3) commands in a terminal or command prompt window to install them.

Version 0.13 (and greater) of the Bottle framework no longer supports Python versions that are 2.6 or older, and it should be noted that the code in this article has only been tested on Python 3. Access the Bottle and Elasticsearch libraries’ __version__ attribute if you’re unsure which versions are installed.

Screenshot of Python 3.7 IDLE returning the version numbers for Bottle and Elasticsearch

Create globals for the Elasticsearch and Bottle servers

The code in this article passes globals to the libraries’ method calls. The following code will set the Bottle server’s port to 1234, but you can set it to whatever you want:

# globals for the Elasticsearch domain
# ResourceWarning: unclosed <socket.socket> error if HTTP in domain
DOMAIN = "localhost"
ELASTIC_PORT = 9200
BOTTLE_PORT = 1234

NOTE: The default port for Elasticsearch is 9200, but this can be configured in your Elasticsearch installation’s elasticsearch.yml file, and it will take affect after you restart the service.

connect to Elasticsearch

Use the following code to attempt to connect to your server’s Elasticsearch cluster:

try:

    # concatenate a string for the Elasticsearch connection
    domain_str = DOMAIN + ":" + str(ELASTIC_PORT)

    # declare a client instance of the Python Elasticsearch library
    client = Elasticsearch( domain_str )

    # info() method raises error if domain or conn is invalid
    print (json.dumps( Elasticsearch.info(client), indent=4 ), "\n")

except Exception as err:
    print ("Elasticsearch() ERROR:", err, "\n")

    # client is set to none if connection is invalid
    client = None

It will set the client library instance to None in the case of an exception, or if the cluster is not running.

Return Elasticsearch documents from every index stored on the cluster

The following code defines a Python function that will get all of the cluster’s indices, and their respective _mapping fields and document data, and it will return the data in a nested Python dictionary object called all_docs:

def get_elasticsearch_data(client, query={}, page=0):
    # create a dict for the Elasticsearch docs
    all_docs = {}

Get all of the Elasticsearch cluster’s index names

This part of the function will check if the client object instance is valid, and, if that’s the case, it will pass the * wildcard to the indices.get_alias() method call to get all of the index names in a list:

    # make API call if client is not None
    if client != None:

        # returns a list of all the cluster's indices
        all_indices = client.indices.get_alias("*")

Iterate over the Elasticsearch index names

Now we can iterate over the index names using a for loop to get the _mapping schema fields and document data for all of the user-created indices that are not system indices (indices starting with a period, like .kibana):

        # iterate over the index names
        for ind in all_indices:

            # skip hidden indices with '.' in name
            if "." not in ind[:1]:

                # nest another dictionary for index inside
                all_docs[ ind ] = {}

                # print the index name
                print ( "\nindex:", ind )

Get the Elasticsearch documents for each index

Pass the index name object (ind) to the Elasticsearch client’s search() method to have it return some documents:

                # get 10 of the Elasticsearch documents from index
                docs = client.search(
                    from_ = page, # for pagination
                    index = ind,
                    body = {
                        'size' : 10,
                        'query': {
                            # pass query paramater
                            'match_all' : query
                        }
                })

NOTE: Since indices may have thousands of documents, the above code will return only 10 documents (using the 'size' parameter) so that the server, user, and web page won’t be overwhelmed with documents. You can modify the code in this article to support pagination by passing an integer value to the from_ parameter, otherwise it will always only return the index’s first 10 documents.

Put the Elasticsearch documents into the Python dictionary

Access the ["hits"]["hits"] nested value to get the document data. The following code will take this document data and put it into another dictionary (nested inside of all_docs) under the "docs" key:

                # get just the doc "hits"
                docs = docs["hits"]["hits"]

                # print the list of docs
                print ("index:", ind, "has", len(docs), "num of docs.")

                # put the list of docs into a dict key
                all_docs[ ind ]["docs"] = docs

Get the Elasticsearch _mapping schema for the indices

The last step is to pass the index name to the client’s indices.get_mapping() method to have it return the nested mapping schema. Use the following code to have the function put this data into the dictionary under the "fields" key:

                try:
                    # returns dict object of the index _mapping schema
                    raw_data = client.indices.get_mapping( ind )
                    print ("get_mapping() response type:", type(raw_data))

                    # returns dict_keys() obj in Python 3
                    mapping_keys = raw_data[ ind ]["mappings"].keys()
                    print ("\n_mapping keys():", mapping_keys)

                    # get the index's doc type
                    doc_type = list(mapping_keys)[0]
                    print ("doc_type:", doc_type)

                    # get the schema by accessing index's _doc type
                    schema = raw_data[ ind ]["mappings"][ doc_type ]["properties"]
                    print (json.dumps(schema, indent=4))
                    print ("\n# of fields in _mapping:", len(schema))
                    print ("all fields:", list(schema.keys()) )

                    all_docs[ ind ]["fields"] = schema
                    all_docs[ ind ]["doc_type"] = doc_type
                except Exception as err:
                    print ("client.indices error:", err)
                    all_docs[ ind ]["fields"] = {}
                    all_docs[ ind ]["doc_type"] = doc_type

NOTE: As you can see from the above code, the "fields" and "doc_type" keys will just have an empty dictionary assigned to them in the case of an exception.

Now all we have to do is return the all_docs nested dictionary object after the iteration is complete:

    # return all of the doc dict
    return all_docs

Convert Elasticsearch document data into an HTML string

This next function, called html_elasticsearch(), will parse all of the Elasticsearch data returned by the above get_elasticsearch_data() function and concatenate an HTML string in the form of a table.

Create a function that will create an HTML string from the Elasticsearch data

The first step is to declare the function, and to use the codec library’s open() method to access the HTML file that we created earlier:

# gef a function that will return HTML string for frontend
def html_elasticsearch():

    html_file = codecs.open("index.html", 'r')
    html = html_file.read()

Call the get_elasticsearch_data() function to get the Elasticsearch data

Now pass the client object instance to the get_elasticsearch_data() function that we created earlier to get the cluster’s data:

    # get all of the Elasticsearch indices, field names, & documents
    elastic_data = get_elasticsearch_data(client)

    # if there's no client then show on frontend
    if client != None:

        print ("html_elasticsearch() client:", client)

Append the Elasticsearch data to the HTML string

Iterate over the each index and its document data and append <th> (table header) and <tr> (table row) elements to the HTML string, with each iteration, for the index’s name and fields:

        # iterate over the index names
        for index, val in elastic_data.items():

            # create a new HTML table from the index name
            html += '<br><h3>Index name: ' + str(index) + '</h3>'
            html += '<table id="' + str(index) + '" class="table table-responsive">'

            # grab the "fields" list attribute created in get_elasticsearch_data()
            fields = source_data = elastic_data[index]["fields"]
            print ("\n\nfields:", fields)

            # new table row
            html += '\n<tr>'

            # enumerate() over the index fields
            for num, field in enumerate(fields):
                html += '<th>' + str(field) + '</th>'

            # close the table row for the Elasticsearch index fields
            html += '\n</tr>'

Create a table row (<tr>) for each Elasticsearch document as well. The following code will iterate over the index’s documents and put each them inside of <tb> tags:

            # get all of the docs in the Elasticsearch index
            all_docs = elastic_data[index]["docs"]
            print ("\nall_docs type:", type(all_docs))

            # enumerate() over the list of docs
            for num, doc in enumerate(all_docs):
                print ("\ndoc:", doc)

                # new row for each doc
                html += '<tr>\n'

                # iterate over the _source dict for the doc
                for f, val in doc["_source"].items():
                    html += '<td>' + str(val) + '</td>'
               
                html += '</tr>'

Return an HTML warning if the connection to Elasticsearch is invalid

The last step in this part of the process is to close the <table> tag, and append a warning message to the HTML string if no data was returned, or if the client object is invalid:

            # close the table tag for the Elasticsearch index
            html += '</table><br>'
    elif client == None:
        html += '<h3 style="color:red">Warning: Elasticsearch cluster is not running on'
        html += ' port: ' + str(ELASTIC_PORT) + '</h3>'
    elif elastic_data == {}:
        html += '<h3>Elasticsearch did not return index information</h3>'

This last part of the function will return the HTML string along with closing tags for the <body> and <html> elements:

    # return the HTML string
    return html + "</body></html>\n\n"

Run the Elasticsearch web application using Python’s Bottle Framework

We can call then call the html_elasticsearch() function inside of the Bottle framework’s GET HTTP protocol function, and also pass the global parameters to the Bottle library’s run() method:

@get('/')
def elastic_app():
    # call the func to return HTML to framework
    return html_elasticsearch()

# pass a port for the framework's server
run(
    host = DOMAIN,
    port = BOTTLE_PORT,
    debug = True
)

Conclusion to creating an Elasticsearch web application in Python

Run the web application using the python3 command followed by the script name:

python3 elastic_app.py

You should see Python return something like the following in your terminal window:

Listening on http://localhost:1234/
Hit Ctrl-C to quit.

Screenshot of a web page displaying the Elasticsearch document results using the Python Bottle framework

Restarting the Elasticsearch web application

If Python returns an error saying: OSError: [Errno 48] Address already in use, then type lsof -ti tcp:1234 to search for the process using the open port. Once you get the PID (Process ID) for the process just use the kill -9 command (followed by the PID) to kill it before attempting to run the script again.

Screenshot of a UNIX terminal killing processes using the open port for the Elasticsearch web application

This concludes the two-part series explaining how you can build a simple Elasticsearch web application in Python using the Bottle framework. Below you can find the complete Python code used in this article.

Just the Code

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the run and get methods from the bottle library
from bottle import run, get

# import the elasticsearch client library
from elasticsearch import Elasticsearch

# import Python's json library to format JSON responses
import json

# use codecs to open the existing HTML file
import codecs

# globals for the Elasticsearch domain
# ResourceWarning: unclosed <socket.socket> error if HTTP in domain
DOMAIN = "localhost"
ELASTIC_PORT = 9200
BOTTLE_PORT = 1234

try:

    # concatenate a string for the Elasticsearch connection
    domain_str = DOMAIN + ":" + str(ELASTIC_PORT)

    # declare a client instance of the Python Elasticsearch library
    client = Elasticsearch( domain_str )

    # info() method raises error if domain or conn is invalid
    print (json.dumps( Elasticsearch.info(client), indent=4 ), "\n")

except Exception as err:
    print ("Elasticsearch() ERROR:", err, "\n")

    # client is set to none if connection is invalid
    client = None

def get_elasticsearch_data(client, query={}, page=0):
    # create a dict for the Elasticsearch docs
    all_docs = {}

    # make API call if client is not None
    if client != None:

        # returns a list of all the cluster's indices
        all_indices = client.indices.get_alias("*")

        # iterate over the index names
        for ind in all_indices:

            # skip hidden indices with '.' in name
            if "." not in ind[:1]:

                # nest another dictionary for index inside
                all_docs[ ind ] = {}

                # print the index name
                print ( "\nindex:", ind )

                # get 20 of the Elasticsearch documents from index
                docs = client.search(
                    from_ = page, # for pagination
                    index = ind,
                    body = {
                        'size' : 10,
                        'query': {
                            # pass query paramater
                            'match_all' : query
                        }
                })

                # get just the doc "hits"
                docs = docs["hits"]["hits"]

                # print the list of docs
                print ("index:", ind, "has", len(docs), "num of docs.")

                # put the list of docs into a dict key
                all_docs[ ind ]["docs"] = docs

                try:
                    # returns dict object of the index _mapping schema
                    raw_data = client.indices.get_mapping( ind )
                    print ("get_mapping() response type:", type(raw_data))

                    # returns dict_keys() obj in Python 3
                    mapping_keys = raw_data[ ind ]["mappings"].keys()
                    print ("\n_mapping keys():", mapping_keys)

                    # get the index's doc type
                    doc_type = list(mapping_keys)[0]
                    print ("doc_type:", doc_type)

                    # get the schema by accessing index's _doc type
                    schema = raw_data[ ind ]["mappings"][ doc_type ]["properties"]
                    print ("all fields:", list(schema.keys()) )

                    all_docs[ ind ]["fields"] = schema
                    all_docs[ ind ]["doc_type"] = doc_type
                except Exception as err:
                    print ("client.indices error:", err)
                    all_docs[ ind ]["fields"] = {}
                    all_docs[ ind ]["doc_type"] = doc_type

    # return all of the doc dict
    return all_docs

# gef a function that will return HTML string for frontend
def html_elasticsearch():

    html_file = codecs.open("index.html", 'r')
    html = html_file.read()

    # get all of the Elasticsearch indices, field names, & documents
    elastic_data = get_elasticsearch_data(client)

    # if there's no client then show on frontend
    if client != None:

        print ("html_elasticsearch() client:", client)

        # iterate over the index names
        for index, val in elastic_data.items():

            # create a new HTML table from the index name
            html += '<br><h3>Index name: ' + str(index) + '</h3>'
            html += '<table id="' + str(index) + '" class="table table-responsive">'

            # grab the "fields" list attribute created in get_elasticsearch_data()
            fields = source_data = elastic_data[index]["fields"]
            print ("\n\nfields:", fields)

            # new table row
            html += '\n<tr>'

            # enumerate() over the index fields
            for num, field in enumerate(fields):
                html += '<th>' + str(field) + '</th>'

            # close the table row for the Elasticsearch index fields
            html += '\n</tr>'

            # get all of the docs in the Elasticsearch index
            all_docs = elastic_data[index]["docs"]
            print ("\nall_docs type:", type(all_docs))

            # enumerate() over the list of docs
            for num, doc in enumerate(all_docs):
                print ("\ndoc:", doc)

                # new row for each doc
                html += '<tr>\n'

                # iterate over the _source dict for the doc
                for f, val in doc["_source"].items():
                    html += '<td>' + str(val) + '</td>'
               
                html += '</tr>'

            # close the table tag for the Elasticsearch index
            html += '</table><br>'
    elif client == None:
        html += '<h3 style="color:red">Warning: Elasticsearch cluster is not running on'
        html += ' port: ' + str(ELASTIC_PORT) + '</h3>'
    elif elastic_data == {}:
        html += '<h3>Elasticsearch did not return index information</h3>'

    # return the HTML string
    return html + "</body></html>\n\n"

@get('/')
def elastic_app():
    # call the func to return HTML to framework
    return html_elasticsearch()

# pass a port for the framework's server
run(
    host = DOMAIN,
    port = BOTTLE_PORT,
    debug = True
)

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.