How to Bulk Import into Elasticsearch using NodeJS

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

If you have a large dataset that you want to import into Elasticsearch, an easy way to accomplish the task is to use the curl command. This method of bulk indexing data makes use of Elasticsearch’s Bulk API, which allows users to index or delete many documents in a single API call. With this functionality, bulk indexing data becomes a fast and simple task. In this tutorial, we’ll use a sample dataset to demonstrate how to do a bulk import in Elasticsearch with curl.

Prerequisites

Before we take a look at the bulk import process, it’s important to mention a few prerequisites that need to be in place. For this task, the system requirements are minimal: Elasticsearch needs to be installed and running. Although it’s not required, it can be beneficial to have Kibana installed as well. In addition to these system requirements, it’s also helpful to have some basic familiarity with the curl command.

Understanding the Data

In our bulk import tutorial, we’ll be importing a sample dataset called accounts.json, which can be downloaded directly from Elastic’s website. The short snippet of the data shown below can help you understand the basic structure of the file. You can use the sample data file as is or modify this code to fit your specific application.

File Snippet: accounts.json

1
2
3
...
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
...

Use Curl to Communicate with the Elasticsearch Bulk API

We’ll be using the curl command to send data to Elasticsearch in this tutorial. If you haven’t had much experience with curl functionality, the underlying concept is simple: curl allows you to use HTTP requests to talk to a server. Here, we’ll use it to talk to Elasticsearch.

Since we’re running Elasticsearch locally with the default port of 9200 in our example, our command looks like the one shown below. You may need to modify it depending on the server location where Elasticsearch is hosted:

1
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/financial/accounts/_bulk?pretty' --data-binary @accounts.json

By interacting with Elasticsearch’s Bulk API endpoint at localhost:9200/financial/accounts/_bulk?pretty, this command will create the financial index and accounts type, and it will insert each of these entries within that. From a traditional database perspective, it might be simpler to think of financial as the database and accounts as the table. With that in mind, it’s easy to understand how all of these records will be imported into the accounts type (table) within the financial index (database).

Taking a closer look at our curl command, you’ll see we included the -H option. This allows you to specify the content type, which in this case is newline-delimited JSON: application/x-ndjson. We also made use of the --data-binary @filename option, which allows you to send the data with no extra processing.

Once you’ve executed your curl command, the console will output a long list of data similar to what is shown below confirming the successful import:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
...
},
{
"index" : {
"_index" : "financial",
"_type" : "accounts",
"_id" : "995",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 196,
"_primary_term" : 1,
"status" : 201
}
}
]
}

Conclusion

If you’re working with Elasticsearch, you’ll probably need to import a large dataset at some point. Fortunately, this is an easy task to accomplish with the help of the curl command and the Elasticsearch Bulk API. With these tools at your disposal, it’s simple and painless to transfer a data file to Elasticsearch using curl and have it properly indexed.

Learn More

While it’s easy to perform a command-line bulk import using curl, it’s also possible to do the same import using Kibana if you’re more comfortable with that interface. For more information on how to accomplish this, please see their documentation. If you’d like to learn more about curl and how it can work for you, you can get more familiar with the tool at: https://curl.haxx.se/docs/manpage.html

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.