How To Use the Ingest Attachment Plugin To Index Files In Elasticsearch
Introduction
If you’re storing full files in Elasticsearch you might have come across the Ingest Attachments plugin which allows you to index files. In this tutorial we’ll show you how to install the plugin and use it so you index base64 encoded files. Please make sure you have met the requirements before you take the steps laid out in this tutorial.
Prerequisites
- Elasticsearch should installed and running, and the low-level client for Elasticsearch needs to be installed as well. Use PIP to install the library for Python 3:
pip3 install elasticsearch
. - The field you are trying to extract should be a base64 encoded binary type.
- It’s recommended that the
ingest-attachment
plugin should be installed but you we’ll walk through that process if you don’t have it installed already. - You should have Kibana running on your server if you plan on making HTTP requests to the Elasticsearch cluster.
To verify Elasticsearch is running make a cURL request to port Elasticsearch is running on to have it return some cluster information:
1 | curl -XGET "localhost:9200" |
Download and install the ‘ingest-attachment’ plugin for Elasticsearch
To install a plugin for Elasticsearch use the bin/elasticsearch-plugin
command followed by the plugin name. Here’s the command to install the ingest-attachment
plugin:
1 | sudo bin/elasticsearch-plugin install ingest-attachment |
This will download the plugin from Elastic’s website and you’ll receive a prompt to continue with installation once the download is complete.
Open a terminal window and execute the bin/elasticsearch-plugin install
command with sudo
privileges:
Use the Ingest API to setup a pipeline for the Attachment Processor
The next step is to execute a cURL command in the terminal or Kibana for a PUT request for Elasticsearch to create a pipeline for the Attachment Processor. Let’s take a look at the cURl command:
1 2 3 4 5 6 7 8 9 10 11 | curl -XPUT "localhost:9200/_ingest/pipeline/attachment" -H 'Content-Type: application/json' -d' { "description" : "Field for processing file attachments", "processors" : [ { "attachment" : { "field" : "data" } } ] } |
Elasticsearch will return a JSON response of "acknowledged": <span>true
if the ingest-attachment
plugin was installed correctly:
Convert data to a Base64 encoding scheme before indexing it in Elasticsearch with the Attachment pipeline
The attachment plugin requires that any data being indexed with the attachment
pipeline be base64 encoded beforehand. This PUT
request will return a 500
bad request HTTP response:
1 2 3 4 5 6 | # This will return a 500 bad request HTTP response: curl -X PUT "localhost:9200/some_index/_doc/42?pipeline=attachment" -H 'Content-Type: application/json' -d' { "data": "Heres some plain text." } ' |
Use an online Base64 encoder, or use Javascript or your server’s backend scripting language to encode data to Base64. Try that HTTP request again, only this time use base64 encoded data:
1 2 3 4 5 6 | # PUT request to index base64 encoded data: curl -X PUT "localhost:9200/some_index/_doc/42?pipeline=attachment" -H 'Content-Type: application/json' -d' { "data": "SGVyZSdzIHNvbWUgcGxhaW4gdGV4dC4=" } ' |
Using scripts to encode strings to the Base64 encoding scheme
There are many different ways to encode strings to byte64, and there are even websites that will allow you to paste text into a field, or upload a file, to do it for you.
Encoding a string to Base64 using Javascript
In Javascript this can easily be done using the btoa()
method:
1 2 3 4 5 | function encodeStrBase64(stringData) { var b64 = btoa(unescape(encodeURIComponent(stringData))); console.log('encodeStrBase64():', b64); return b64; } |
Encoding a string to Base64 using Python:
In Python you can you the base64
standard library:
1 2 3 4 | import base64 plain_string = "Here's some plain text." bytes_string = bytes(plain_string, 'utf-8') base64.b64encode(bytes_string) |
Encoding a string to Base64 in PHP:
Just pass a string into the base64_encode()
function in PHP :
1 2 3 |
Conclusion
In this article we detailed how to extract files in Elasticsearch using the Ingest Attachments plugin. We showed how to encode data into base64 in PHP, Python, and Javascript because the field will have to be in this format for the plugin to index files. It’s of tremendous value to be able to index files in Elasticsearch we hope this tutorial helps you utilize this functionality in your specific application. If you have any questions don’t hesitate to reach out to us.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started