Elasticsearch Version History - What it does and doesn't do

Introduction

When you store a document in elasticsearch you should see a _version parameter like the one below:

{
   "ok": true,
   "_index": "products",
   "_type": "dairy",
   "_id": "1",
   "_version": 1
}

This is elasticsearch’s built-in version tracking system. But what does version tracking mean exactly? Exactly what functionality does it provide? Does it allow you to compare documents? In this article we’ll answer these questions and more so please read on.

Elasticsearch Version Tracking

There can be some misconceptions about the elasticsearch version tracking, so let’s talk explicitly about what the elasticsearch version tracking system does and doesn’t do.

What it Does

When you create a new document in Elasticsearch it assigns that document with a _version: 1. When you make any subsequent update on that document, that _version is incremented by 1 with every update, index, or delete. If you got a success returned from your update, then Elasticsearch guarantees that the _version number was incremented by 1.

What it Does Not Do

Even though Elasticsearch uses the _version to keep track of whether the document has changed, it does not keep a history of those changes. For example, if you had a document that had been created then subsequently updated 49 times, it would be on _version: 50 but Elasticsearch does not provide functionality for you to go back and check what _version: 25 looked like. You can only get the current version of the document. Keeping a history of the exact changes or get the document at each version is something you’d have to implement yourself.

What Good does tracking the _version if you can’t view the history of the document?

The reason why Elasticsearch keeps track of the _version is to handle concurrency problems.

What is Concurrency?

Concurrency is managing simultaneous access to a database. It prevents two users from editing the same record at the same time which often results in incorrect data.

How does _version solve Concurrency?

To explain how _version solves concurrency we’ll look at a simple example. Imagine you own a website where users can rate a restaurant. The website lists all the restaurants and lets users to vote up or vote down the restaurant which either increments or decrements the voteCount field.

Now let’s say a restaurant Downtown Cafe has a total of 99 votes already and the a user clicks the vote up icon. A request to post the update of number of votes might look like this:

curl -XPOST 'http://localhost:9200/ratings/restaurant/123' -d'
{
    "name": "Downtown Cafe",
    "voteCount": 100
}'

This might seem alright at first glance but this implementation has a serious flaw. What if two users are viewing the site while it has 99 votes and they both vote it up, what happens? Well, the requests will look exactly the same and they both explicitly set the voteCount to 100, when it really should end up being 101.

One Way to Fix This

There’s one way to fix this with the update api which lets you tell Elasticsearch that you want voteCount to be incremented by one from whatever it is at the moment.

Using this approach a better implementation would be this:

curl -XPOST 'http://localhost:9200/ratings/restaurant/123/_update' -d'
{
   "script" : "ctx._source.voteCount += 1"
}'

This approach is much better … but there is still a small window for error. This is the same problem as before but with a smaller window for error. There is time between when Elasticsearch retrieves the document and indexing that document that error can still occur.

This is where the Elasticsearch versioning system comes into play.

Elasticsearch Optimistic Locking

Elasticsearch versioning makes it easy to use a pattern called optimistic locking. How it works, is that you tell Elasticsearch what version you expect to be making updates to, and if no new versions have been made then it will proceed with those changes. But, if the document does have a newer version than the one you specified, then Elasticsearch will send you a response letting you know so your application can handle it as you see fit.

So let’s say we made request to Elasticsearch for our same restaurant and got the following response:

curl -XGET 'http://localhost:9200/ratings/restaurant/123'
{
   "_index": "ratings",
   "_type": "restaurant",
   "_id": "1",
   "_version": 3,
   "exists": true,
   "_source": {
      "name": "Downtown Cafe",
      "voteCount": 99
   }
}

With this information we can display the number of current votes for the restaurant on the site. Now if a user votes up the restaurant we make this request:

curl -XPOST 'http://localhost:9200/ratings/restaurant/123?version=3' -d'
{
    "name": "Downtown Cafe",
    "voteCount": 100
}'

Note the version=3 in the query parameter. This will tell Elasticsearch to only update voteCount to 100 if no changes have been made (eg. the document is still currently on version 3).

With this architecture, Elasticsearch only has to compare the version numbers to determine whether the update is allowed. This is faster than a locking mechanism tends to be. If there has been a change and the document has a _version > 3, then the operation will return an error code 409 for Conflict. On the other hand, if there has been no changes and the document is still on _version: 3 then Elasticsearch will make the update and return a 200 OK. Either way, our website can take the appropriate action: If the request fails, then it will simply make a request for the latest version of the document and try to update that version of the document.

Elasticsearch makes this process even easier with the retry_on_conflict parameter which is typically used with scripted updates like so:

curl -XPOST 'http://localhost:9200/ratings/restaurant/123/_update?retry_on_conflict=3' -d'
{
   "script" : "ctx._source.voteCount += 1"
}'

Note: For some fields you don’t need versioning checking. It is completely optional so you should give careful thought to what fields need it and which fields don’t.

Conclusion

In this short article we talked about versioning in Elasticsearch. We discussed what Elasticsearch doesn’t do, keep a history of every version of the document. We also talked about the versioning system that it does offer and how to use it.

If you need help with Elasticsearch in production or have any database needs at all, please don’t hesitate to reach out to us at Object Rocket where we make sure your data is safe and sound.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed Redis,
MongoDB & Elasticsearch

Get Started

OR

Try CockroachDB
in Beta

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.