How to Upload Images to an Elasticsearch Index

Introduction

Although Elasticsearch is known for its fast and powerful text search, it’s possible to index more than just text. For example, it’s easy to upload images to Elasticsearch. You can use Python’s PIL library to extract an image’s EXIF data, and then index the image’s data to an Elasticsearch index. In this article, we’ll provide step-by-step instructions for storing images with Python in Elasticsearch.

Prerequisites for indexing photos to Elasticsearch

Before we look at how to upload photos to Elasticsearch, let’s go over the key system requirements for the task:

  • Elasticsearch must be installed and running, along with Java and its JVM. You can check if Elasticsearch is running by making a GET request to the server on the default port of 9200. It should return a JSON response with cluster information:
curl -XGET "localhost:9200"
  • When you index photos with Python in Elasticsearch, you may want to use the Kibana Console UI to verify that the photos or images uploaded correctly. If so, you need to make sure that the Kibana service is running as well. Simply load the interface in a browser by navigating to:
https://{YOUR_DOMAIN}:5601
  • You’ll need to create an Elasticsearch index for the image data. If you haven’t created one yet, we’ll go over how to do it in this article. We’ll be indexing EXIF data in Elasticsearch for this tutorial, so it’s best not to have an index with a rigid _mapping schema– not all images have the same EXIF data fields, and some images don’t have any at all.

Install all the Python libraries and dependencies using pip3

The Python code used in this article is tested and written with Python 3 in mind, since Python 2.7 is now deprecated. Therefore, we’ll be using the pip3 command for the PIP package manager for Python 3 to install the necessary libraries and modules.

Install Pillow (PIL) for Python 3

Throughout this tutorial, we’ll be using the Python Imaging Library, also known as PIL or Pillow. We’ll be using Pillow with Python and Elasticsearch to index images. Let’s begin by installing PIL using the following command:

pip3 install Pillow

If you’d like to upgrade PIL to the latest version, use this command:

pip3 install -U Pillow

>NOTE: Older versions of PIP allowed for the entire Pillow library to be imported (e.g. import PIL); however, since version 2.0 of PIL, only its classes can be imported (e.g. from PIL import Image)

Install pybase64 for the Base64 Python encoding library

Next, we’ll install and upgrade the base64 encoding and decoding wrapper for the libbase64 Python library:

pip3 install -U pybase64

This library is necessary for converting documents to base64 in order to index them to an Elasticsearch index.

Select a personal photo or public domain image to index to Elasticsearch

The image we’ll be using in this article is cute-kittens-in-basket.jpg from publicdomainpictures.net.

The actual image we’re using is a scaled down (1000×669) version of the original “cute-kittens” image:

Screenshot of GIMP scaling down the size of the original kittens image

If you’d like to follow along with the instructions in this tutorial, use a public domain image like this one, or take a photo with your phone and put it into the same directory as your Python script.

Create a new project directory and Python script

Once you’ve confirmed all the system requirements and selected an image, you can get started on the code. The first step will be to create a directory and a Python script for the code. Then, move the image you’d like to index into the project directory. You can create a directory using the mkdir command in a terminal window:

mkdir index-image-project

After you create your directory, use the touch command to create a new Python script, or use nano to edit a new file and save it as a Python script:

touch index_image.py

>NOTE: Keep in mind that the names of Python scripts use underscores (_) instead of hyphens (-) (e.g. my_script.py).

Open a new Finder window (or whatever GUI-based folder application available on the OS where the Elasticsearch cluster is running), and move the image to the new project folder.

If you’re in a terminal window, this can be done using the mv command by specifying the target directory in the second parameter:

mv my-image.jpg /Users/username/Desktop/index-image-project/my-image.jpg

Import the necessary Python libraries and modules to index an image to Elasticsearch

Next, we’ll need to import libraries and modules such as Elasticsearch and PIL into Python. Let’s edit the project’s Python script, making sure to import all of the following modules, classes, and libraries at the top:

# import the Elasticsearch low-level client
from elasticsearch import Elasticsearch

# import the Image and TAGS classes from Pillow (PIL)
from PIL import Image
from PIL.ExifTags import TAGS

import uuid # for image meta data ID
import base64 # convert image to b64 for indexing
import datetime # for image EXIF data timestamp

After making these edits, save your script and run it in a terminal or command prompt window (using the python3 command) to make sure that all modules were properly installed using pip3.

If you get an ImportError, then return to the Prerequisites section of this article and make sure all of the libraries are installed and updated. Pillow (commonly known as PIL), elasticsearch, and pybase64 (or base64 for Python) are the only libraries we’ll be using in this article that don’t come packaged with a Python installation by default.

Create a new client instance of the Elasticsearch low-level client

Before we proceed, let’s make sure that the Elasticsearch cluster is up and running. Once you’ve confirmed this, we’ll add the following code to create a new client instance of Elasticsearch in the Python script:

# create a client instance of Elasticsearch
elastic_client = Elasticsearch([{'host': 'localhost', 'port': 9200}])

Start the Kibana service

You may want to use the Kibana Console UI to make HTTP requests and verify that an image has been indexed. If so, then you’ll need to make sure that Kibana is running.

Use Python’s PIL library to grab the image’s EXIF meta data

The PIL library (Python Imaging Library) is instrumental in making it possible to index photos in Elasticsearch with Python. The PIL library has a TAGS class that allows one to read an image’s EXIF meta data. You can use this data to create custom fields for the Elasticsearch document’s _source data. This allows the document to be indexed, organized and searched using the image’s EXIF tags.

Set up the image document’s index data for Elasticsearch

If you have a specific _id or _index in mind for your selected image document, it’s best to declare that information before doing anything else. Let’s also declare a string containing the file name (and its path if the image is not in the same directory as the script):

# create an Image instance of photo
_file = "cute-kittens-in-basket.jpg"
_index = "images"
_id = 1

Create a PIL Image instance of the image in the directory

Next, we’ll pass that _file string to PIL’s Image.open method as 'rb'. We’ll have it return a PIL Image object of the target image:

img = Image.open(open(_file, 'rb'))

Use’s PIL’s PIL.ExifTags Python library to read an image’s EXIF meta data

Make sure to import the TAGS class at the beginning of your script. This makes it possible to index an image EXIF with PIL:

from PIL.ExifTags import TAGS

Create a Python function to get the PIL Image’s EXIF data

We’ll need to create a function that first verifies the PIL Image object and then uses the _getexif() method to return all of the image’s EXIF data:

def get_image_exif(img):
    # use PIL to verify image is not corrupted
    img.verify()

    try:
        # call the img's getexif() method and return EXIF data
        exif = img._getexif()
        exif_data = {}

        # iterate over the exif items
        for (meta, value) in exif.items():
            try:
                # put the exif data into the dict obj
                exif_data[TAGS.get(meta)] = value
            except AttributeError as error:
                print ('get_image_meta AttributeError for:', file_name, '--', error)
    except AttributeError:
        # if img file doesn't have _getexif, then give empty dict
        exif_data = {}
    return exif_data

Now we can use the exif_data dictionary object for the Elasticsearch document’s _source data. Here’s the code we need to do it:

# get the _source dict for Elasticsearch doc
_source = create_exif_data(img)

# store the file name in the Elasticsearch index
_source['name'] = _file

The function will place all of the EXIF data into a Python dict object and return that object to be indexed later on. For our sample image, the object’s key-value pairs would look like the following:

_source: {
    'mime_type': 'image/jpeg',
    'name': 'cute-kittens-in-basket.jpg',
    'datetime': '2019:06:14 21:18:04',
    'make': 'Camera Unknown',
    'model': 'Camera Unknown',
    'uuid': '1a994cbe-8d03-4c07-9f26-6d13930dcbcd'
}

Create another function to generate missing EXIF data if needed

At this point, we’ll need to declare another function that will parse the EXIF dict data returned by the get_image_exif() function. This function will ensure that all of the documents in our Elasticsearch index have the same fields by generating EXIF data for them, even if some images don’t have any.

This function will parse through the EXIF keys using if and elif conditional statements to check if the EXIF data is present:

def create_exif_data(img):

    # create a new dict obj for the Elasticsearch doc
    es_doc = {}
    es_doc["size"] = img.size

    # put PIL Image conversion in a try-except indent block
    try:
        # create PIL Image from path and file name
        img = Image.open(_file)
    except Exception as error:
        print ('create_exif_data PIL ERROR:', error, '-- for file:', _file)

    # call the method to have PIL return exif data
    exif_data = get_image_exif(img)

    # get the PIL img's format and MIME
    es_doc["image_format"] = img.format
    es_doc["image_mime"] = Image.MIME[img.format]

    # get datetime meta data from one of these keys if possible
    if 'DateTimeOriginal' in exif_data:
        es_doc['datetime'] = exif_data['DateTimeOriginal']

    elif 'DateTime' in exif_data:
        es_doc['datetime'] = exif_data['DateTime']

    elif 'DateTimeDigitized' in exif_data:
        es_doc['datetime'] = exif_data['DateTimeDigitized']

    # if none of these exist, then use current timestamp
    else:
        es_doc['datetime'] = str( datetime.datetime.now() )

    # create a UUID for the image if none exists
    if 'ImageUniqueID' in exif_data:
        es_doc['uuid'] = exif_data['ImageUniqueID']
    else:
        # create a UUID converted to string
        es_doc['uuid'] = str( uuid.uuid4() )

    # make and model of the camera that took the image
    if 'Make' in exif_data:
        es_doc['make'] = exif_data['Make']
    else:
        es_doc['make'] = "Camera Unknown"

    # camera unknown if none exists
    if 'Model' in exif_data:
        es_doc['model'] = exif_data['Model']
    else:
        es_doc['model'] = "Camera Unknown"

    if 'Software' in exif_data:
        es_doc['software'] = exif_data['Software']
    else:
        es_doc['software'] = 'Unknown Software'

    # get the X and Y res of image
    if 'XResolution' in exif_data:
        es_doc['x_res'] = exif_data['XResolution']
    else:
        es_doc['x_res'] = None

    if 'YResolution' in exif_data:
        es_doc['y_res'] = exif_data['YResolution']
    else:
        es_doc['y_res'] = None
    # return the dict
    return es_doc

Create a NumPy array

In our next step, we’ll be using the file string (_file) declared earlier. We’ll open the image using PIL’s Image class, and pass that Image object instance to NumPy’s asarray() method. Be sure to cast the NumPy array as a normal Python list object using the tolist() method; this ensures that the image’s pixel data can be put in lists and then stored in Elasticsearch:

# covert NumPy of PIL image to simple Python list obj
img_array = np.asarray( Image.open( _file ) ).tolist()

Convert the Python list object to a str

Keep in mind that Elasticsearch only allows strings or encoded strings to be passed– not objects such as lists. We’ll need to convert the list to a string using the str() function:

# convert the nested Python array to a str
img_str = str( img_array )

Use Python’s Base64 library to encode images

You can also encode an image using Python’s Base64 library:

img_base64 = base64.b64encode( bytes(img_str, "utf-8") )

Put the raw image data into the _source object

At this point, we’re ready to put the raw image data into a field key of the _source object:

# put the encoded string into the _source dict
_source["raw_data"] = img_base64

Create an Elasticsearch index and put the image data into it

The final step is to create an index in Elasticsearch if you have not done so already. Then, we can index the _source dictionary object with all of the EXIF and raw image data.

Create an Elasticsearch index and ignore any 400 HTTP errors

If the _index specified earlier already exists, be sure to use the ignore=400 option. This will instruct Elasticsearch to attempt to create an index, but also to ignore any HTTP 400 error codes so that the script will continue on without interruption:

# create the "images" index for Elasticsearch if necessary
resp = elastic_client.indices.create(
    index = _index,
    body = "{}",
    ignore = 400 # ignore 400 already exists code
)

print ("\nElasticsearch create() index response:", resp)

Index the image dict object to the Elasticsearch index

In the previous step, we passed all the document data declared earlier to the Elasticsearch client’s index() method. Here, we have it return a response to confirm that the API call was successful and that there were no errors:

# call the Elasticsearch client's index() method
resp = elastic_client.index(
    index = _index,
    doc_type = '_doc',
    id = _id,
    body = _source
)

print ("\nElasticsearch index() response:", resp)

Getting the raise ConnectionTimeout(“TIMEOUT”, str(e), e) error while calling the Elasticsearch index() method

If you encounter a ConnectionTimeout error while indexing the image to Elasticsearch, simply use the request_timeout option while calling the index method:

# call the Elasticsearch client's index() method
resp = elastic_client.index(
    index = _index,
    doc_type = 'img',
    id = _id,
    body = _source,
    request_timeout=60
)

The example shown above attempts to index the image data but will wait 60 seconds before it raises a timeout error. This can be useful for larger images that may take longer than expected to index.

Conclusion

While Elasticsearch is known for its powerful text search capabilities, it’s also possible to upload photos to Elasticsearch with Python. Although the example we reviewed in this tutorial shows the indexing of a single photo, you can use the same basic process to bulk index in Elasticsearch in Python. If you needed to bulk index documents in Elasticsearch, you would still read and generate EXIF data for each image in Python and then index that data to an Elasticsearch index. Using the step-by-step instructions provided in this tutorial, you’ll have no trouble uploading photos to Elasticsearch.

Open up the Kibana Console UI to verify that the image was indexed

If you’d like to verify that your image was indexed properly, you can use the Kibana Console UI. Simply navigate to port 5601 on your server’s domain (or to localhost:5601) in a web browser, and click on Dev Tools (represented by a small “wrench” icon in Kibana v7.x). Then, make the following GET request in the left console pane to verify that the image indexed properly:

GET images/_search

You can also pass the document’s _id to an HTTP request directly using _doc as the document type (as specified in the index() method):

GET images/_doc/1

Once you click on the green arrow icon, the right pane of the Kibana Console will display information about the indexed image:

Screenshot of a GET request in Kibana to _search for the indexed image as a list object

Just the Code

We’ve looked at our example code one section at a time throughout this tutorial. The following code represents the complete script needed to index images in Elasticsearch:

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the Elasticsearch low-level client
from elasticsearch import Elasticsearch

# import the Image and TAGS classes from Pillow (PIL)
from PIL import Image
from PIL.ExifTags import TAGS

import uuid # for image meta data ID
import base64 # convert image to b64 for indexing
import datetime # for image meta data timestamp

# create a client instance of Elasticsearch
elastic_client = Elasticsearch([{'host': 'localhost', 'port': 9200}])

"""
Function that uses PIL's TAGS class to get an image's EXIF
meta data and returns it all in a dict
"""

def get_image_exif(img):
    # use PIL to verify image is not corrupted
    img.verify()

    try:
        # call the img's getexif() method and return EXIF data
        exif = img._getexif()
        exif_data = {}

        # iterate over the exif items
        for (meta, value) in exif.items():
            try:
                # put the exif data into the dict obj
                exif_data[TAGS.get(meta)] = value
            except AttributeError as error:
                print ('get_image_meta AttributeError for:', file_name, '--', error)
    except AttributeError:
        # if img file doesn't have _getexif, then give empty dict
        exif_data = {}
    return exif_data

"""
Function to create new meta data for the Elasticsearch
document. If certain meta data is missing from the orginal,
then this script will "fill in the gaps" for the new documents
to be indexed.
"""

def create_exif_data(img):

    # create a new dict obj for the Elasticsearch doc
    es_doc = {}
    es_doc["size"] = img.size

    # put PIL Image conversion in a try-except indent block
    try:
        # create PIL Image from path and file name
        img = Image.open(_file)
    except Exception as error:
        print ('create_exif_data PIL ERROR:', error, '-- for file:', _file)

    # call the method to have PIL return exif data
    exif_data = get_image_exif(img)

    # get the PIL img's format and MIME
    es_doc["image_format"] = img.format
    es_doc["image_mime"] = Image.MIME[img.format]

    # get datetime meta data from one of these keys if possible
    if 'DateTimeOriginal' in exif_data:
        es_doc['datetime'] = exif_data['DateTimeOriginal']

    elif 'DateTime' in exif_data:
        es_doc['datetime'] = exif_data['DateTime']

    elif 'DateTimeDigitized' in exif_data:
        es_doc['datetime'] = exif_data['DateTimeDigitized']

    # if none of these exist, then use current timestamp
    else:
        es_doc['datetime'] = str( datetime.datetime.now() )

    # create a UUID for the image if none exists
    if 'ImageUniqueID' in exif_data:
        es_doc['uuid'] = exif_data['ImageUniqueID']
    else:
        # create a UUID converted to string
        es_doc['uuid'] = str( uuid.uuid4() )

    # make and model of the camera that took the image
    if 'Make' in exif_data:
        es_doc['make'] = exif_data['Make']
    else:
        es_doc['make'] = "Camera Unknown"

    # camera unknown if none exists
    if 'Model' in exif_data:
        es_doc['model'] = exif_data['Model']
    else:
        es_doc['model'] = "Camera Unknown"

    if 'Software' in exif_data:
        es_doc['software'] = exif_data['Software']
    else:
        es_doc['software'] = exif_data['Unknown Software']

    # get the X and Y res of image
    if 'XResolution' in exif_data:
        es_doc['x_res'] = exif_data['XResolution']
    else:
        es_doc['x_res'] = None

    if 'YResolution' in exif_data:
        es_doc['y_res'] = exif_data['YResolution']
    else:
        es_doc['y_res'] = None
    # return the dict
    return es_doc


# create an Image instance of photo
_file = "cute-kittens-in-basket.jpg"
_index = "images"
_id = 1
img = Image.open(open(_file, 'rb'))

# get the _source dict for Elasticsearch doc
_source = create_exif_data(img)

# store the file name in the Elasticsearch index
_source['name'] = _file

# covert NumPy of PIL image to simple Python list obj
img_array = np.asarray( Image.open( _file ) ).tolist()

# convert the nested Python array to a str
img_str = str( img_array )

# put the encoded string into the _source dict
_source["raw_data"] = img_str

# create the "images" index for Elasticsearch if necessary
resp = elastic_client.indices.create(
    index = _index,
    body = "{}",
    ignore = 400 # ignore 400 already exists code
)

print ("\nElasticsearch create() index response -->", resp)

# call the Elasticsearch client's index() method
resp = elastic_client.index(
    index = _index,
    doc_type = '_doc',
    id = _id,
    body = _source,
    request_timeout=60
)
print ("\nElasticsearch index() response -->", resp)

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.