How to use the Reindex API to copy one document to a different index
Introduction
If you’re managing data in Elasticsearch, there are times when you’ll want to copy a document to a different index. A document may have been written to the wrong index, or your overall database structure may have changed. Regardless of the exact circumstances, it’s easy to copy documents and reindex an Elasticsearch document using the Reindex API. In this tutorial, we’ll explain how to make copies of documents in another index with just a few simple steps.
Content-Type Header Disclaimer
NOTE: Since the rollout of version 6.0, Elasticsearch has begun enforcing a strict content-type checking for cURL requests. What this means is that cURL requests must now include -H 'Content-Type: application/json'
as a header option whenever the request has a JSON object in its content body. The header option explicitly specifies that the content type is in JSON format. If this header option is omitted, you’ll get a 406 Content-Type
header error:
You can use the command curl --help
for more information about the various options.
Understanding the Reindex API for Elasticsearch
The Reindex API makes it easy to copy a document in one index and place the duplicate of it in another pre-existing index. The API uses the _source
data available in all Elasticsearch documents to accomplish this task.
Let’s assume that we have two indices in an Elasticsearch cluster that have the same mapping layout. We want to move some of the documents from one index to another:
In the steps that follow, you’ll see exactly how this is done.
Get a _mapping
of the Elasticsearch indexes
If you want to reindex an Elasticsearch document, the first step is to make a cURL request to GET
the various mappings of each index– the one where the document currently resides, and the one where you want to move it to:
1 2 3 | curl -XGET "localhost:9200/people1/_mapping/peeps?pretty" curl -XGET "localhost:9200/people2/_mapping/peeps?pretty" curl -XGET "localhost:9200/animals/_mapping/pets?pretty" |
The JSON response you receive will look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | { "people1" : { "mappings" : { "peeps" : { "properties" : { "accounts" : { "type" : "text" }, "age" : { "type" : "integer" }, "join_date" : { "type" : "date" }, "name" : { "type" : "text" }, "sex" : { "type" : "text" } } } } } } |
This step is an important part of the process because you need to ensure that the two indices in question have compatible mappings. You can’t re-index a document into a destination index that has a different "_mapping"
type.
Mismatched _mapping
type
Attempting to re-index a document into an index that has a different mapping data type will result in an illegal_argument_exception
. In this example, you can see there’s a discrepancy where the "animals"
index has the type "pets"
in its mapping, while the "people1"
index has "peeps"
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | curl -XPOST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d' { "source": { "index": "animals", "type": "pets", "query": { "term": { "_id": "D8lr02kBXluIHJG2BGRZ" } } }, "dest": { "index": "people1" } } ' |
The mapping itself, which includes all the fields and the layout, doesn’t have to be an exact match for a re-indexing operation to work. Only the "_mapping"
field’s "type"
has to match for the two indices to be compatible.
Re-index an entire index:
Let’s look at another example. This time, we want to re-index an entire index. We’ll be using a POST
request to update the "dest"
index to match all of the documents of the "source"
index:
1 2 3 4 5 6 7 8 9 10 | curl -XPOST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d' { "source": { "index": "people1" }, "dest": { "index": "people4" } } ' |
If the destination index has no documents, they will be created to match the source documents; however, if there are documents in the destination index that match the id
of the source documents, then the destination documents will be updated accordingly.
Re-index with a query to create a duplicate document(s):
As long as the mapping "type"
field from the source index matches the "type"
from the destination index, the re-indexing process will duplicate the document as long as no version conflicts exist.
Make an exact duplicate of an Elasticsearch document in another index:
Let’s look at another example where we copy a document from one index to another:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | curl -XPOST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d' { "source": { "index": "people1", "type": "peeps", "query": { "term": { "_id": "VclS02kBXluIHJG2Dlhd" } } }, "dest": { "index": "people4" } }' |
The JSON response from the _reindex
HTTP request will look like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | { "took" : 7, "timed_out" : false, "total" : 1, "updated" : 0, "created" : 1, "deleted" : 0, "batches" : 1, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0, "failures" : [ ] } |
At this point, there should be an exact replica of the original document in the people4
index that was copied from people1
. This means that everything about the document will be copied, including the document’s _id
. Now, both indexes contain a document with an _id
of VclS02kBXluIHJG2Dlhd
.
Duplicate a document in another index using a different _id
:
While we just saw an example where an exact replica of a document was created in a different index, there may be situations when you don’t want a duplicate document that shares the same "id"
. In this case, it’s possible to create a new document with a unique id
that still has all the same fields and values as the original.
First, we’ll delete the exact duplicate that was just re-indexed into people4
:
1 2 3 4 5 6 7 8 | curl -XPOST "localhost:9200/people4/peeps/_delete_by_query?pretty" -H 'Content-Type: application/json' -d' { "query": { "match": { "_id": "VclS02kBXluIHJG2Dlhd" } } }' |
Then, we’ll use a GET
request to get the _source
data of the document in people1
that we want to copy:
1 | curl -XGET "localhost:9200/people1/peeps/VclS02kBXluIHJG2Dlhd?pretty" |
You can highlight the _source
data with your mouse and copy it:
Finally, we’ll PUT
that data into another index using a new _id
:
1 2 3 4 5 6 7 8 9 | curl -XPUT "localhost:9200/people4/peeps/SOME_NEW_ID?pretty" -H 'Content-Type: application/json' -d' { "name" : "Oct Locke", "age" : "36", "sex" : "female", "accounts" : "oct_locke", "join_date" : "2012-05-20" } ' |
There is now a copy of document VclS02kBXluIHJG2Dlhd
in another index under the new _id
of SOME_NEW_ID
:
Conclusion
Copying a document to a different index is a common task in database management, so it’s important to know how to do it correctly. In this tutorial, we’ve discussed a couple of different ways to reindex an Elasticsearch document– one way creates an exact replica that preserve the source document’s id, while the alternate method creates a copy of the document with a unique id. With the step-by-step instructions provided above, you’ll be able to copy documents and move them to different indices with just a few simple commands.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started