How To Use Python To Download Photos From An Elasticsearch Index

Introduction

This tutorial will demonstrate how to store an image on an Elasticsearch index and embed its raw encoded data and the EXIF meta data into an image using Python’s PIL and Piexif libraries. Examples will include how to export an elasticsearch image with python, export elasticsearch documents with python, convert base64 to pil images, create pil image elasticsearch document, create python elasticsearch images, how to use elasticsearch python pil, exif data python pil and an elasticsearch python piexif.

The example documents used in this tutorial have some very specific EXIF data stored on the Elasticsearch document, as shown here:

Screenshot of Kibana getting an image document and its EXIF data stored on an Elasticsearch index

The results of using the code provided here may vary depending on how the image was originally indexed and what EXIF data was included at the time of indexing.

Prerequisites for Indexing Photos to Elasticsearch

  • The Elasticsearch cluster must be properly installed and running. With the server running on the default 9200 port, execute the following HTTP cURL request to the server to return cluster information:
curl -XGET "localhost:9200"
  • The Kibana service must be running in order to use the Kibana Console UI to make HTTP requests to verify the images uploaded correctly, as follows:
https://{YOUR_DOMAIN}:5601
  • An Elasticsearch index must be created to upload images. This tutorial will briefly explain the process and also how to create a mapping for the index.

How to install all the Python libraries and dependencies using pip3

Python 2.7 is now deprecated and is scheduled to lose support by January 2020. The Python code used in this tutorial was written and tested with Python 3. As such, it is required to use the pip3 command for the PIP package manager for Python 3 to install the necessary libraries and modules.

How to install Pillow (PIL) for Python 3

Install the Python Imaging Library (PIL) with the following command:

pip3 install Pillow

The following command will upgrade Pillow to the latest version:

pip3 install -U Pillow

>NOTE: Older versions of PIP allowed for the entire Pillow library to be imported (e.g. import PIL). However, since v2.0 of PIL only its classes can be imported (e.g. from PIL import Image)

How to install pybase64 for the Base64 Python encoding library

To properly decode the images stored on the Elasticsearch index, install and upgrade the Base64 library for Python (libbase64) with the following command:

pip3 install -U pybase64

How to install the piexif Python library for the Elasticsearch images

Execute the following pip3 command to install the Piexif library. This will allow the image’s EXIF data to be edited and saved after it is exported from an Elasticsearch index:

pip3 install piexif

Screenshot of Python IDLE importing the base64, PIL, and piexif image libraries

How to import the Elasticsearch client library

Import the Elasticsearch client class to declare an instance of the low-level client library with the following command:

# import the Elasticsearch low-level client
from elasticsearch import Elasticsearch

How to Import Other Python Libraries to Create a PIL Image from Elasticsearch Data

Import Python’s time library to create timestamp strings as follows:

import time

How to import the Base64 library for Python when the data is encoded

Import base64 to decode Base64 data on the Elasticsearch document as follows:

# import Base64 to decode Elasticsearch _source image
import base64

How to import the PIL Image library for Python

Execute the following command to convert the Elasticsearch document into an image:

# use Python's PIL (Pillow) library to create an Image instance
from PIL import Image

How to import the PIEXIF library for Python

Execute the following command to change an image’s EXIF data:

# import piexif to write EXIF data to new image
import piexif

How to import the AST library for its literal_eval class to create a list of image pixels from string

The ast library’s literal_eval() method is a safer way to evaluate string expressions than using Python’s eval() as it will only work on regular Python datatypes. Use the following command:

# use to convert string to nested list
import ast

How to import NumPy to create a NumPy array from the Python list() object of pixel data

Import the numpy library using the alias np as the PIL has the fromarray() method that allows the creation of a PIL Image instance from a NumPy ndarray object, as shown here:

# import the NumPy library using the 'np' alias
import numpy as np

How to Declare an Elasticsearch Client Instance and Get the Document Image

Declare an instance of the Elasticsearch low-level client library as follows:

# create a client instance of Elasticsearch
elastic_client = Elasticsearch([ {'host': 'localhost', 'port': 9200} ])

Call the client instance’s get() method to obtain the image’s EXIF and raw pixel data from the Elasticsearch cluster:

# call Elasticsearch's get() method to retrieve a document data
_doc = elastic_client.get(
    index = 'images',
    doc_type = '_doc',
    id = 1
)

How to Get the Image Data from the Elasticsearch Document Object

Execute some try and exception indentation blocks to obtain the image’s meta, EXIF, and raw data. Be aware the field names or dictionary keys for the document on the Elasticsearch cluster may be different from the examples in this tutorial.

Use the following commands to check for a Python KeyError exception when retrieving data:

try:
    # get the document's _source data
    _source = _doc["_source"]
except KeyError:
    print ("The Elasticsearch document is not valid")
    quit()

try:
    raw_data = _source["raw_data"]
except KeyError:
    print ("The Elasticsearch document is not a valid image")
    quit()

# try to get image size from Elasticsearch
try:
    img_size = _source["size"]
except KeyError:
    img_size = (0, 0)

# get the image's filename
try:
    filename = _source["name"]
    if "." in filename:
        filename = filename[:filename.find(".")]
except KeyError:
    # generate filename using timestamp
    filename = "unknown_" + str( int(time.time()) )

# get the image's file extension
try:
    image_format = _source["image_format"]
except KeyError:
    image_format = get_img_type(filename)
    if image_format == None:
        image_format = "jpeg"
        filename = filename + "." + image_format

How to use Python’s AST Library to Convert an Image’s String Data to a List Object

Execute the below command to pass the raw string data for the image to the AST library’s literal_eval() method to create a Python list of the image’s pixel values by evaluating the Python list object casted as a string:

# use literal_eval() to eval a Python list of pixel data
img_list = ast.literal_eval( raw_data )

>NOTE: The Base64 library’s b64decode() method must first be called if the data is an encoded Base64 object.

How to Pass the Data as a Numpy Array to Pil’s Image Class to Create a New Image Object

Execute the following command to pass the data as a Numpy array:

# create a NumPy array from the eval string array
img_nparray = np.array( img_list )

# create a PIL image from the NumPy array
try:
    img = Image.fromarray( img_nparray.astype('uint8'), 'RGBA')
except Exception as err:
    print ("Image.fromarray() ERROR:", err)
    img = Image.fromarray( img_nparray.astype('uint8'), 'RGB')

How to Create EXIF Data for the Image Object with Elasticsearch Data

This example will attempt to create an image that includes the RGBA alpha channel. It will create an RGB image instead if an exception is raised.

If there is some EXIF data stored in Elasticsearch document along with the pixel data, this example will parse it out of the dict object returned by Elasticsearch and put it in the PIL instance.

There are four different EXIF data types, and, depending on the data type, must be put into the correct dict object, as follows:

# create new dict objects for the EXIF meta data
zero_data = {}
first_data = {}
exif_data = {}
gps_data = {}

# iterate the Elasticsearch document's _source data
for key, val in _source.items():

    # parse out the Exif meta data for the image
    if val != None:

        # get datetime and unique ID data
        if "datetime" in key.lower():
            exif_data[ piexif.ExifIFD.DateTimeOriginal ] = val
            zero_data[ piexif.ImageIFD.DateTime ] = val
        if "uuid" in key.lower():
            exif_data[ piexif.ExifIFD.ImageUniqueID ] = val
            first_data[ piexif.ImageIFD.RawDataUniqueID ] = val

        # get XResolution and YResolution
        elif "x_res" in key.lower() or "xres" in key.lower():
            zero_data[ piexif.ImageIFD.XResolution ] = val
        elif "y_res" in key.lower() or "yres" in key.lower():
            zero_data[ piexif.ImageIFD.YResolution ] = val

        # get the image's name, make, and model
        elif key == "name":
            first_data[ piexif.ImageIFD.OriginalRawFileName ] = val
            new_file = val

        elif key == "make":
            zero_data[ piexif.ImageIFD.Make ] = val
            zero_data[ piexif.ImageIFD.DocumentName ] = val

        elif key == "model":
            zero_data[ piexif.ImageIFD.Model ] = val

        elif key == "software":
            zero_data[ piexif.ImageIFD.Software ] = val

# print the image's parsed meta data
print ("\n0th:", zero_data)
print ("\nExif:", exif_data)
print ("\nGPS:", gps_data)
print ("\n1st:", first_data)

>NOTE: This example does not include all of the available EXIF data fields.

How to Put the EXIF Data Together in One Dictionary and Pass it to Piexif’s Dump() Method

Nest all four of the data types inside another Python dictionary using the "0th", "Exif", "GPS", and "1st" dictionary keys, as shown here:

# put the image's meta data into a nested dictionary
exif_dict = {
    "0th": zero_data,
    "Exif": exif_data,
    "GPS":gps_data,
    "1st": first_data
}

Now create a bytes object of the EXIF data using the Piexif library’s dump() method with the following command:

# create a piexif bytes object from nested Exif dict
exif_bytes = piexif.dump(exif_dict)

How to Save the PIL Image as an Image File

The final step is to save the PIL Image file in order to export the image as a file. The exif option must be set to the exif_bytes Piexif object created earlier:

# export the image using the PIL's instance's save() method
img.save(filename + "." + image_format, exif=exif_bytes)

Now open up a Finder or Windows folder window to confirm the image was created.

How to view the EXIF data of the exported Elasticsearch image

The image’s EXIF data can be viewed on macOS by opening the file using the “Preview” application for images. Inside Preview click on “Tools” in the menubar and then on “Show Inspector.” The inspector feature should have a (i) (information) tab and then an Exif section, as seen here:

Screenshot of a photo exported from an Elasticsearch index using Python

Conclusion

This tutorial demonstrate examples of how to store an image on an Elasticsearch index and embed its raw encoded data and the EXIF meta data into an image using Python’s PIL and Piexif libraries Examples included how to export an elasticsearch image with python, export elasticsearch documents with python, convert base64 to pil images, create pil image elasticsearch document, create python elasticsearch images, how to use elasticsearch python pil, exif data python pil and an elasticsearch python piexif. Bear in mind there are four different EXIF data types, and, depending on the data type must be placed into the correct dictionary object.

Just the Code

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the Elasticsearch low-level client
from elasticsearch import Elasticsearch

import time

# use Python's PIL (Pillow) library to create an Image instance
from PIL import Image

# import piexif to write EXIF data to new image
import piexif

# use to convert string to nested list
import ast

# import the NumPy library using the 'np' alias
import numpy as np

# create a client instance of Elasticsearch
elastic_client = Elasticsearch([ {'host': 'localhost', 'port': 9200} ])

# call Elasticsearch's get() method to retrieve a document data
_doc = elastic_client.get(
    index = 'images',
    doc_type = '_doc',
    id = 1
)

try:
    # get the document's _source data
    _source = _doc["_source"]
except KeyError:
    print ("The Elasticsearch document is not valid")
    quit()

try:
    raw_data = _source["raw_data"]
except KeyError:
    print ("The Elasticsearch document is not a valid image")
    quit()

# try to get image size from Elasticsearch
try:
    img_size = _source["size"]
except KeyError:
    img_size = (0, 0)

# get the image's filename
try:
    filename = _source["name"]
    if "." in filename:
        filename = filename[:filename.find(".")]
except KeyError:
    # generate filename using timestamp
    filename = "unknown_" + str( int(time.time()) )

# get the image's file extension
try:
    image_format = _source["image_format"]
except KeyError:
    image_format = get_img_type(filename)
    if image_format == None:
        image_format = "jpeg"
        filename = filename + "." + image_format

# use literal_eval() to eval a Python list of pixel data
img_list = ast.literal_eval( raw_data )

# create a NumPy array from the eval string array
img_nparray = np.array( img_list )

# create a PIL image from the NumPy array
try:
    img = Image.fromarray( img_nparray.astype('uint8'), 'RGBA')
except Exception as err:
    print ("Image.fromarray() ERROR:", err)
    img = Image.fromarray( img_nparray.astype('uint8'), 'RGB')

# create new dict objects for the EXIF meta data
zero_data = {}
first_data = {}
exif_data = {}
gps_data = {}

# iterate the Elasticsearch document's _source data
for key, val in _source.items():

    # parse out the Exif meta data for the image
    if val != None:

        # get datetime and unique ID data
        if "datetime" in key.lower():
            exif_data[ piexif.ExifIFD.DateTimeOriginal ] = val
            zero_data[ piexif.ImageIFD.DateTime ] = val
        if "uuid" in key.lower():
            exif_data[ piexif.ExifIFD.ImageUniqueID ] = val
            first_data[ piexif.ImageIFD.RawDataUniqueID ] = val

        # get XResolution and YResolution
        elif "x_res" in key.lower() or "xres" in key.lower():
            zero_data[ piexif.ImageIFD.XResolution ] = val
        elif "y_res" in key.lower() or "yres" in key.lower():
            zero_data[ piexif.ImageIFD.YResolution ] = val

        # get the image's name, make, and model
        elif key == "name":
            first_data[ piexif.ImageIFD.OriginalRawFileName ] = val
            new_file = val

        elif key == "make":
            zero_data[ piexif.ImageIFD.Make ] = val
            zero_data[ piexif.ImageIFD.DocumentName ] = val

        elif key == "model":
            zero_data[ piexif.ImageIFD.Model ] = val

        elif key == "software":
            zero_data[ piexif.ImageIFD.Software ] = val

# print the image's parsed meta data
print ("\n0th:", zero_data)
print ("\nExif:", exif_data)
print ("\nGPS:", gps_data)
print ("\n1st:", first_data)

# put the image's meta data into a nested dictionary
exif_dict = {
    "0th": zero_data,
    "Exif": exif_data,
    "GPS":gps_data,
    "1st": first_data
    }

# create a piexif bytes object from nested Exif dict
exif_bytes = piexif.dump(exif_dict)

# export the image using the PIL's instance's save() method
img.save(filename + "." + image_format, exif=exif_bytes)

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.