How to Bulk Import into Elasticsearch using Curl

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

If you have a large dataset that you want to import into Elasticsearch an easy way to accomplish this using a specific curl command. This method of bulk indexing data makes use of Elasticsearch’s Bulk API which allows users to index or delete many documents in a single API call. With this functionality bulk indexing becomes a fast and simple task. In this tutorial we’ll use a sample dataset to demonstrate how to do a bulk import in Elasticsearch with curl.

Prerequisites

Before we take a look at the bulk import process, it’s important to mention a few prerequisites that need to be in place. For this task the system requirements are minimal: Elasticsearch needs to be installed and running. Although it’s not required it can be beneficial to have Kibana installed as well. In addition to these system requirements it’s also helpful to have some basic familiarity with the curl command.

Understanding the Data

In this bulk import tutorial we’ll be importing a sample dataset called accounts.json which can be downloaded directly from here. The short snippet of the data shown below can help you see the basic structure of the data in the file. You can use the sample data file as is or modify this data to fit your needs.

File Snippet: accounts.json

1
2
3
...
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
...

Use Curl to Communicate with the Elasticsearch Bulk API

We’ll be using the curl command to import data into Elasticsearch. If you haven’t had much experience with curl functionality the underlying concept is simple: curl allows you to use HTTP requests to talk to a server. Here, we’ll use it to communicate with Elasticsearch.

We’re running Elasticsearch locally with the default port of 9200 and our command to bulk import into Elasticsearch is shown below. You may need to modify it depending on the server location where Elasticsearch is hosted:

1
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/financial/accounts/_bulk?pretty' --data-binary @accounts.json

By interacting with Elasticsearch’s Bulk API endpoint at localhost:9200/financial/accounts/_bulk?pretty this command will create a financial index and an accounts type and it will insert each of these records within that type. From a traditional database perspective it might be simpler to think of financial as the database and accounts as the table. With that in mind, it’s easy to understand how all of these records will be imported into the accounts type (table) within the financial index (database).

Taking a closer look at the curl command, you’ll see we included the -H option. This flag allows you to specify the content type which in this case is newline-delimited JSON: application/x-ndjson. We also made use of the --data-binary @filename flag option which allows you to import the data with no extra processing.

Once you’ve executed the curl command the console will output a long list of data similar to what is shown below confirming the successful import:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
...
},
{
"index" : {
"_index" : "financial",
"_type" : "accounts",
"_id" : "995",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 196,
"_primary_term" : 1,
"status" : 201
}
}
]
}

Conclusion

If you’re working with Elasticsearch you’ll probably need to import a large dataset at some point. Fortunately, this is an easy task to accomplish with the help of the curl command and the Elasticsearch Bulk API. With these tools at your disposal it’s simple and painless to transfer a data file into Elasticsearch and have it properly indexed using curl.

Learn More

While it’s easy to perform a command-line bulk import using curl it’s also possible to do the same import using Kibana if you’re more comfortable with that interface. For more information on how to accomplish this please see the Elasticsearch or talk to an expert at Object Rocket.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.