How To Use the Ingest Attachment Plugin To Index Files In Elasticsearch

Introduction

If you’re storing full files in Elasticsearch you might have come across the Ingest Attachments plugin which allows you to index files. In this tutorial we’ll show you how to install the plugin and use it so you index base64 encoded files. Please make sure you have met the requirements before you take the steps laid out in this tutorial.

Prerequisites

  • Elasticsearch should installed and running, and the low-level client for Elasticsearch needs to be installed as well. Use PIP to install the library for Python 3: pip3 install elasticsearch.
  • The field you are trying to extract should be a base64 encoded binary type.
  • It’s recommended that the ingest-attachment plugin should be installed but you we’ll walk through that process if you don’t have it installed already.
  • You should have Kibana running on your server if you plan on making HTTP requests to the Elasticsearch cluster.

To verify Elasticsearch is running make a cURL request to port Elasticsearch is running on to have it return some cluster information:

curl -XGET "localhost:9200"

Screenshot of a GET request using cURL to have Elasticsearch return cluster information

Download and install the ‘ingest-attachment’ plugin for Elasticsearch

To install a plugin for Elasticsearch use the bin/elasticsearch-plugin command followed by the plugin name. Here’s the command to install the ingest-attachment plugin:

sudo bin/elasticsearch-plugin install ingest-attachment

This will download the plugin from Elastic’s website and you’ll receive a prompt to continue with installation once the download is complete.

Open a terminal window and execute the bin/elasticsearch-plugin install command with sudo privileges:

Screenshot of a command in terminal to install the "ingest-attachment" plugin for Elasticsearch

Use the Ingest API to setup a pipeline for the Attachment Processor

The next step is to execute a cURL command in the terminal or Kibana for a PUT request for Elasticsearch to create a pipeline for the Attachment Processor. Let’s take a look at the cURl command:

curl -XPUT "localhost:9200/_ingest/pipeline/attachment" -H 'Content-Type: application/json' -d'
{
  "description" : "Field for processing file attachments",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}

Elasticsearch will return a JSON response of "acknowledged": <span>true if the ingest-attachment plugin was installed correctly:

Screenshot of a PUT request to Elasticsearch to create a pipeline for the Attachment Processor


Convert data to a Base64 encoding scheme before indexing it in Elasticsearch with the Attachment pipeline

The attachment plugin requires that any data being indexed with the attachment pipeline be base64 encoded beforehand. This PUT request will return a 500 bad request HTTP response:

# This will return a 500 bad request HTTP response:
curl -X PUT "localhost:9200/some_index/_doc/42?pipeline=attachment" -H 'Content-Type: application/json' -d'
{
  "data": "Heres some plain text."
}
'

Screenshot of the Kibana Console returning a 500 HTTP error code because the data indexed using the "attachment" pipeline was not base64 encoded data

Use an online Base64 encoder, or use Javascript or your server’s backend scripting language to encode data to Base64. Try that HTTP request again, only this time use base64 encoded data:

# PUT request to index base64 encoded data:
curl -X PUT "localhost:9200/some_index/_doc/42?pipeline=attachment" -H 'Content-Type: application/json' -d'
{
  "data": "SGVyZSdzIHNvbWUgcGxhaW4gdGV4dC4="
}
'

Screenshot of a successful PUT request to index data using the "attachment" pipeline


Using scripts to encode strings to the Base64 encoding scheme

There are many different ways to encode strings to byte64, and there are even websites that will allow you to paste text into a field, or upload a file, to do it for you.

Encoding a string to Base64 using Javascript

In Javascript this can easily be done using the btoa() method:

function encodeStrBase64(stringData) {
    var b64 = btoa(unescape(encodeURIComponent(stringData)));
    console.log('encodeStrBase64():', b64);
    return b64;
}

Encoding a string to Base64 using Python:

In Python you can you the base64 standard library:

import base64
plain_string = "Here's some plain text."
bytes_string = bytes(plain_string, 'utf-8')
base64.b64encode(bytes_string)

Encoding a string to base64 in Python's IDLE environment

Encoding a string to Base64 in PHP:

Just pass a string into the base64_encode() function in PHP :

$strg = "Here's some plain text.";
$data = base64_encode($strg);
echo '<p>'. $data. '</p>';

Conclusion

In this article we detailed how to extract files in Elasticsearch using the Ingest Attachments plugin. We showed how to encode data into base64 in PHP, Python, and Javascript because the field will have to be in this format for the plugin to index files. It’s of tremendous value to be able to index files in Elasticsearch we hope this tutorial helps you utilize this functionality in your specific application. If you have any questions don’t hesitate to reach out to us.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.