How to use the Reindex API to copy one document to a different index

Introduction

If you’re managing data in Elasticsearch, there are times when you’ll want to copy a document to a different index. A document may have been written to the wrong index, or your overall database structure may have changed. Regardless of the exact circumstances, it’s easy to copy documents and reindex an Elasticsearch document using the Reindex API. In this tutorial, we’ll explain how to make copies of documents in another index with just a few simple steps.

Content-Type Header Disclaimer

NOTE: Since the rollout of version 6.0, Elasticsearch has begun enforcing a strict content-type checking for cURL requests. What this means is that cURL requests must now include -H 'Content-Type: application/json' as a header option whenever the request has a JSON object in its content body. The header option explicitly specifies that the content type is in JSON format. If this header option is omitted, you’ll get a 406 Content-Type header error:

406 Content Type Header Error Missing Header Option

You can use the command curl --help for more information about the various options.

Understanding the Reindex API for Elasticsearch

The Reindex API makes it easy to copy a document in one index and place the duplicate of it in another pre-existing index. The API uses the _source data available in all Elasticsearch documents to accomplish this task.

Let’s assume that we have two indices in an Elasticsearch cluster that have the same mapping layout. We want to move some of the documents from one index to another:

Two Elasticsearch indices with 1000 documents with the same mapping

In the steps that follow, you’ll see exactly how this is done.

Get a _mapping of the Elasticsearch indexes

If you want to reindex an Elasticsearch document, the first step is to make a cURL request to GET the various mappings of each index– the one where the document currently resides, and the one where you want to move it to:

curl -XGET "localhost:9200/people1/_mapping/peeps?pretty"
curl -XGET "localhost:9200/people2/_mapping/peeps?pretty"
curl -XGET "localhost:9200/animals/_mapping/pets?pretty"

The JSON response you receive will look something like this:

{
"people1" : {
"mappings" : {
"peeps" : {
"properties" : {
"accounts" : {
"type" : "text"
},
"age" : {
"type" : "integer"
},
"join_date" : {
"type" : "date"
},
"name" : {
"type" : "text"
},
"sex" : {
"type" : "text"
}
}
}
}
}
}

This step is an important part of the process because you need to ensure that the two indices in question have compatible mappings. You can’t re-index a document into a destination index that has a different "_mapping" type.

Mismatched _mapping type

Attempting to re-index a document into an index that has a different mapping data type will result in an illegal_argument_exception. In this example, you can see there’s a discrepancy where the "animals" index has the type "pets" in its mapping, while the "people1" index has "peeps":

curl -XPOST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "animals",
"type": "pets",
"query": {
"term": {
"_id": "D8lr02kBXluIHJG2BGRZ"
}
}
},
"dest": {
"index": "people1"
}
}
'

illegal_argument_exception Rejecting mapping update more than 1 type

The mapping itself, which includes all the fields and the layout, doesn’t have to be an exact match for a re-indexing operation to work. Only the "_mapping" field’s "type" has to match for the two indices to be compatible.

Re-index an entire index:

Let’s look at another example. This time, we want to re-index an entire index. We’ll be using a POST request to update the "dest" index to match all of the documents of the "source" index:

curl -XPOST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "people1"
},
"dest": {
"index": "people4"
}
}
'

If the destination index has no documents, they will be created to match the source documents; however, if there are documents in the destination index that match the id of the source documents, then the destination documents will be updated accordingly.

Re-index with a query to create a duplicate document(s):

As long as the mapping "type" field from the source index matches the "type" from the destination index, the re-indexing process will duplicate the document as long as no version conflicts exist.

Make an exact duplicate of an Elasticsearch document in another index:

Let’s look at another example where we copy a document from one index to another:

curl -XPOST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "people1",
"type": "peeps",
"query": {
"term": {
"_id": "VclS02kBXluIHJG2Dlhd"
}
}
},
"dest": {
"index": "people4"
}
}'

The JSON response from the _reindex HTTP request will look like the following:

{
"took" : 7,
"timed_out" : false,
"total" : 1,
"updated" : 0,
"created" : 1,
"deleted" : 0,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}

At this point, there should be an exact replica of the original document in the people4 index that was copied from people1. This means that everything about the document will be copied, including the document’s _id. Now, both indexes contain a document with an _id of VclS02kBXluIHJG2Dlhd.

Replica Reindexing of an Elasticsearch Document into another index

Duplicate a document in another index using a different _id:

While we just saw an example where an exact replica of a document was created in a different index, there may be situations when you don’t want a duplicate document that shares the same "id". In this case, it’s possible to create a new document with a unique id that still has all the same fields and values as the original.

First, we’ll delete the exact duplicate that was just re-indexed into people4:

curl -XPOST "localhost:9200/people4/peeps/_delete_by_query?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"_id": "VclS02kBXluIHJG2Dlhd"
}
}
}'

Then, we’ll use a GET request to get the _source data of the document in people1 that we want to copy:

curl -XGET "localhost:9200/people1/peeps/VclS02kBXluIHJG2Dlhd?pretty"

You can highlight the _source data with your mouse and copy it:

`GET` source document and `PUT` it into another index with a new `_id`

Finally, we’ll PUT that data into another index using a new _id:

curl -XPUT "localhost:9200/people4/peeps/SOME_NEW_ID?pretty" -H 'Content-Type: application/json' -d'
{
"name" : "Oct Locke",
"age" : "36",
"sex" : "female",
"accounts" : "oct_locke",
"join_date" : "2012-05-20"
}
'

There is now a copy of document VclS02kBXluIHJG2Dlhd in another index under the new _id of SOME_NEW_ID:

Copy of Document in another index with a new `_id`

Conclusion

Copying a document to a different index is a common task in database management, so it’s important to know how to do it correctly. In this tutorial, we’ve discussed a couple of different ways to reindex an Elasticsearch document– one way creates an exact replica that preserve the source document’s id, while the alternate method creates a copy of the document with a unique id. With the step-by-step instructions provided above, you’ll be able to copy documents and move them to different indices with just a few simple commands.

Pilot the ObjectRocket platform free for 30 Days

It's easy to get started. Imagine the time you'll save by not worrying about database management. Let's do this!

PILOT FREE FOR 30 DAYS

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.