How to Delete a Document in Elasticsearch Using the API

Introduction

One feature that distinguishes Elasticsearch from other databases is that all tasks are executed using a [REST API]. This means that commands and queries executed on a node take the form of HTTP requests– these requests allow users to connect to and manipulate Elasticsearch. The delete API allows users to delete an Elasticsearch document from an index based on that document’s ID. Users can use the REST API and a simple cURL request to delete an Elasticsearch document, or they can use Elasticsearch’s built-in API known as the “delete API”.

Deleting a JSON document using the Delete API

  • Using the API to delete in Elasticsearch enables you to delete any typed JSON document from an Elastic index. It’s important to know the document’s ID in order to use this method.
  • The following command provides an example of the delete API in use. In this case, the JSON document is being deleted from an index named "cars" under a type called "volvo", and the ID of the document is "4":
​DELETE /cars/volvo/4
  • If the deletion was successful, a JSON object will be produced. One of the keys of this object will be "result", and it will have a value of "deleted".
  • If the deletion fails because the index was unable to be found, the delete API will still return a JSON object. This object will have a "reason" key with an associated value of "no such index".

Using the Delete API with cURL

  • Another way to delete a document from an index is by using Elastic’s REST API and cURL. An example of such a cURL request is as follows:
curl -X DELETE "localhost:9200/twitter/tweet/1"

JSON Object Result

  • Both methods of using an API to delete in Elasticsearch will return a JSON object that should look similar to this:
{
"_shards" : {
"total" : 2,
"failed" : 0,
"successful" : 2
},
"found" : true,
"_index" : "cars",
"_type" : "volvo",
"_id" : "4",
"_version" : 2,
"result": "deleted"
}

Optimistic Concurrency Control

Elasticsearch is distributed by nature — this means that when you create, delete, or update a document, Elasticsearch must replicate the new version of this document to all other nodes in the cluster. These requests for replication are delivered in parallel, so they may be received out of sequence at their destination. Optimistic concurrency control ensures that an older version of a document in Elasticsearch does not overwrite a newer version of the document.

Optimistic concurrency control works by assigning a sequence number to a document every time an operation is performed on it. The sequence number is incremented by one with each subsequent operation. With this system in place, Elasticsearch can make sure that a document is never overwritten by a version with a smaller sequence number. In the case of a delete operation, a conflict detected through optimistic concurrency control will return a status code of 409: a VersionConflictException. Please refer to Optimistic concurrency control for more information.

Versioning

Versioning also helps prevent conflicts due to concurrency. Documents stored in Elasticsearch have an associated version number. When the document is first indexed, it is assigned the version number “1”. Every time the document is changed in any way through an update or delete command, the version number is incremented. The version number can be specified in a delete request to ensure that the document being deleted hasn’t changed at all in the meantime. For additional information about versioning, please see Elastic’s blog – Elasticsearch Versioning Support.

Routing

  • When a document is indexed in Elasticsearch, it gets stored in an individual shard. How does Elasticsearch know which shard to assign, or route, a new document to? A routing value can be assigned to a document which provides control over the routing pattern. Documents with the same routing value will be mapped to the same shard. When a document has been given a routing value, that value needs to be specified when deleting the document:
DELETE /twitter/tweet/1?routing=kimchy
  • The command shown above will delete a stored tweet with an id of “1”. If the command did not include the correct routing, the tweet would not be deleted. Instead, a RoutingMissingException would be thrown. To learn more about routing, see Elastic’s blog – Customizing Your Document Routing.

Parent

  • Users can also set a parent parameter, which serves a purpose similar to a routing parameter.
  • When a parent document is deleted, child documents are not necessarily deleted as well.
  • The parent id must be specified in order to delete a child id. If no parent id is provided, the request will throw a RoutingMissingException.

Distributed

As described earlier in the discussion on optimistic concurrency control, Elasticsearch is distributed. This means that delete operations, along with other operations, will be hashed into an individual shard id. The operation is then directed to the primary shard that resides in that id group, and it’s replicated to any other shard replicas in that same id group.

Wait For Active Shards

  • When performing a delete, you can help ensure consistency by setting the wait_for_active_shards parameter. This parameter will require a certain number of active shard copies to exist before beginning to execute the delete operation.

Refresh Option

  • Using the ?refresh option in a cURL request or in the URL of an API call allows users to control when the changes made by the API call become visible in search. Please note that this option should not be confused with the unrelated “_refresh” API. Adding this option forces the API call to wait until the document in question is visible:
?refresh=wait_for

>Several Elasticsearch APs, including the Update, Delete, Index, and Bulk APIs support the ?refresh option to control when the changes specified in his request are made visible to search. The possible values for this option are: Empty string or true, __wait_for, and false (which is the default value)__

There may be certain situations where it’s important to wait for a change to become visible in search; however, in most cases, the quickest and simplest choice is to omit the refresh parameter from the URL or to specify ?refresh=false.

Timeout

  • In certain instances, the shard assigned to handle a delete may be unavailable when a delete request is made. This may occur when a shard is being relocated or recovering from a store. The delete operation will wait up to one minute for that shard to become available before returning an error. To override this behavior, a timeout parameter can be set to specify how long the delete operation should wait. In the following example, the operation will wait five minutes:
DELETE /cars/volvo/1?timeout=5m
  • The cURL version of this request will look like the following:
curl -X DELETE "localhost:9200/cars/volvo/4?timeout=5m"

Conclusion

It’s clear that there are a number of options and parameters that can be specified when making a delete request in Elasticsearch. For the best results, it’s important to understand the distributed and concurrent nature of Elasticsearch’s functionality before performing a delete. With this knowledge in mind, you’ll be able to execute a delete while maintaining the integrity of your database.

Pilot the ObjectRocket platform free for 30 Days

It's easy to get started. Imagine the time you'll save by not worrying about database management. Let's do this!

PILOT FREE FOR 30 DAYS

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.