How to Bulk Import into Elasticsearch using NodeJS
Introduction
If you have a large dataset that you want to import into Elasticsearch, an easy way to accomplish the task is to use the curl
command. This method of bulk indexing data makes use of Elasticsearch’s Bulk API, which allows users to index or delete many documents in a single API call. With this functionality, bulk indexing data becomes a fast and simple task. In this tutorial, we’ll use a sample dataset to demonstrate how to do a bulk import in Elasticsearch with curl.
Prerequisites
Before we take a look at the bulk import process, it’s important to mention a few prerequisites that need to be in place. For this task, the system requirements are minimal: Elasticsearch needs to be installed and running. Although it’s not required, it can be beneficial to have Kibana installed as well. In addition to these system requirements, it’s also helpful to have some basic familiarity with the curl
command.
Understanding the Data
In our bulk import tutorial, we’ll be importing a sample dataset called accounts.json
, which can be downloaded directly from Elastic’s website. The short snippet of the data shown below can help you understand the basic structure of the file. You can use the sample data file as is or modify this code to fit your specific application.
File Snippet: accounts.json
1 2 3 | ... {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"} ... |
Use Curl to Communicate with the Elasticsearch Bulk API
We’ll be using the curl
command to send data to Elasticsearch in this tutorial. If you haven’t had much experience with curl
functionality, the underlying concept is simple: curl
allows you to use HTTP requests to talk to a server. Here, we’ll use it to talk to Elasticsearch.
Since we’re running Elasticsearch locally with the default port of 9200 in our example, our command looks like the one shown below. You may need to modify it depending on the server location where Elasticsearch is hosted:
1 | curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/financial/accounts/_bulk?pretty' --data-binary @accounts.json |
By interacting with Elasticsearch’s Bulk API endpoint at localhost:9200/financial/accounts/_bulk?pretty
, this command will create the financial
index and accounts
type, and it will insert each of these entries within that. From a traditional database perspective, it might be simpler to think of financial
as the database and accounts
as the table. With that in mind, it’s easy to understand how all of these records will be imported into the accounts
type (table) within the financial
index (database).
Taking a closer look at our curl
command, you’ll see we included the -H
option. This allows you to specify the content type, which in this case is newline-delimited JSON: application/x-ndjson
. We also made use of the --data-binary @filename
option, which allows you to send the data with no extra processing.
Once you’ve executed your curl
command, the console will output a long list of data similar to what is shown below confirming the successful import:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ... }, { "index" : { "_index" : "financial", "_type" : "accounts", "_id" : "995", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 196, "_primary_term" : 1, "status" : 201 } } ] } |
Conclusion
If you’re working with Elasticsearch, you’ll probably need to import a large dataset at some point. Fortunately, this is an easy task to accomplish with the help of the curl
command and the Elasticsearch Bulk API. With these tools at your disposal, it’s simple and painless to transfer a data file to Elasticsearch using curl
and have it properly indexed.
Learn More
While it’s easy to perform a command-line bulk import using curl
, it’s also possible to do the same import using Kibana if you’re more comfortable with that interface. For more information on how to accomplish this, please see their documentation. If you’d like to learn more about curl
and how it can work for you, you can get more familiar with the tool at: https://curl.haxx.se/docs/manpage.html
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started