How to Bulk Import into Elasticsearch using Curl
Introduction
If you have a large dataset that you want to import into Elasticsearch an easy way to accomplish this using a specific curl
command. This method of bulk indexing data makes use of Elasticsearch’s Bulk API which allows users to index or delete many documents in a single API call. With this functionality bulk indexing becomes a fast and simple task. In this tutorial we’ll use a sample dataset to demonstrate how to do a bulk import in Elasticsearch with curl.
Prerequisites
Before we take a look at the bulk import process, it’s important to mention a few prerequisites that need to be in place. For this task the system requirements are minimal: Elasticsearch needs to be installed and running. Although it’s not required it can be beneficial to have Kibana installed as well. In addition to these system requirements it’s also helpful to have some basic familiarity with the curl
command.
Understanding the Data
In this bulk import tutorial we’ll be importing a sample dataset called accounts.json
which can be downloaded directly from here. The short snippet of the data shown below can help you see the basic structure of the data in the file. You can use the sample data file as is or modify this data to fit your needs.
File Snippet: accounts.json
1 2 3 | ... {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"} ... |
Use Curl to Communicate with the Elasticsearch Bulk API
We’ll be using the curl
command to import data into Elasticsearch. If you haven’t had much experience with curl
functionality the underlying concept is simple: curl
allows you to use HTTP requests to talk to a server. Here, we’ll use it to communicate with Elasticsearch.
We’re running Elasticsearch locally with the default port of 9200 and our command to bulk import into Elasticsearch is shown below. You may need to modify it depending on the server location where Elasticsearch is hosted:
1 | curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/financial/accounts/_bulk?pretty' --data-binary @accounts.json |
By interacting with Elasticsearch’s Bulk API endpoint at localhost:9200/financial/accounts/_bulk?pretty
this command will create a financial
index and an accounts
type and it will insert each of these records within that type. From a traditional database perspective it might be simpler to think of financial
as the database and accounts
as the table. With that in mind, it’s easy to understand how all of these records will be imported into the accounts
type (table) within the financial
index (database).
Taking a closer look at the curl
command, you’ll see we included the -H
option. This flag allows you to specify the content type which in this case is newline-delimited JSON: application/x-ndjson
. We also made use of the --data-binary @filename
flag option which allows you to import the data with no extra processing.
Once you’ve executed the curl
command the console will output a long list of data similar to what is shown below confirming the successful import:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ... }, { "index" : { "_index" : "financial", "_type" : "accounts", "_id" : "995", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 196, "_primary_term" : 1, "status" : 201 } } ] } |
Conclusion
If you’re working with Elasticsearch you’ll probably need to import a large dataset at some point. Fortunately, this is an easy task to accomplish with the help of the curl
command and the Elasticsearch Bulk API. With these tools at your disposal it’s simple and painless to transfer a data file into Elasticsearch and have it properly indexed using curl
.
Learn More
While it’s easy to perform a command-line bulk import using curl
it’s also possible to do the same import using Kibana if you’re more comfortable with that interface. For more information on how to accomplish this please see the Elasticsearch or talk to an expert at Object Rocket.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started