How to Sort an Analyzed Text Field in Elasticsearch

Introduction

In Elasticsearch, you can define how your string data is processed upon indexing by setting its type to either keyword or text. What’s the difference between these two types? When you store data in a field that has a keyword type, it’s indexed as is; however, data stored in a field with a text type is analyzed before indexing. To put it simply, the analyzer breaks down the value of a text field into its separate terms, making it easy to search for those individual terms. Let’s say we had a string: "Elasticsearch makes life easy". If we stored it as a keyword field, it would be indexed just as it is, but if we stored it as a text field, it would be broken down into its individual terms: “Elasticsearch”, “makes”, “life”, and “easy”.

While analyzing a text field before indexing can be helpful for searching because it allows for partial matching, it can make sorting a bit problematic. Is it possible to alphabetically sort the values of a text field by their original text strings? Fortunately, Elasticsearch makes this task simple to accomplish. In this tutorial, we’ll show you how to sort an analyzed text field in Elasticsearch. If you’re already familiar with basic sorting operations and prefer to dive into the sample code, feel free to skip to Just the Code.

Step 1: Create a subfield of type keyword

The main problem with sorting an analyzed text field is that we lose the original string representation. In our example, our first step will be to save an original copy of the string that we can use for sorting. We’ll do this by using a subfield that will store the original text.

We’re going to use a sample index for this example called store. Our store index will contain a type called products which lists all of the store’s products. We’ll keep our dataset simple by including just a handful of products with a small number of fields: id, price, quantity, and department. The JSON shown below can be used to create the dataset:

idnamepricequantitydepartment
1Multi-Grain Cereal4.994Packaged Foods
21lb Ground Beef3.9929Meat and Seafood
3Dozen Apples2.4912Produce
4Chocolate Bar1.292Packaged Foods, Checkout
51 Gallon Milk3.2916Dairy
60.5lb Jumbo Shrimp5.2912Meat and Seafood
7Wheat Bread1.295Bakery
8Pepperoni Pizza2.995Frozen
912 Pack Cola5.296Packaged Foods
10Lime Juice0.9920Produce
1112 Pack Cherry Cola5.595Packaged Foods
121 Gallon Soy Milk3.3910Dairy
131 Gallon Vanilla Soy Milk3.499Dairy
141 Gallon Orange Juice3.294Juice

The following code shows the mapping:

curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {
"products": {
"properties" : {
"name": { "type": "text"},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'

You can see in the mapping that the name field is of type text, which means values like “1 Gallon Vanilla Soy Milk” get analyzed and broken down into their individual terms: “1”, “Gallon”, “Vanilla”, “Soy”, and “Milk”. Unfortunately, this means we wouldn’t be able to sort the values alphabetically; to remedy this, we’ll create our subfield and store a copy of our original string in it. We’ll have to delete our index and recreate the mapping to accomplish this task, so we’ll also need to re-import the data.

Run this curl command to delete the index:

curl -H "Content-Type: application/json" -XDELETE 127.0.0.1:9200/store

Then run the following curl command to recreate the index with the updated mapping:

curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {
"products": {
"properties" : {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'

After that’s done, we’ll re-import our data:

curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @demo_store_db.json

Notice that our new subfield, named "raw", has the type "keyword". This is important, because values of type "keyword" are not analyzed and are indexed as is. We’ll take advantage of this and store a copy of the original string in this field.

Step 2: Sort by the subfield

Now that our new subfield is in place and we’ve re-imported our data, we can easily sort by the subfield. We’ll use the _search API to accomplish this task, specifying "name.raw" as the field to sort by. The curl command we use is shown below:

Note: In our example, we assume that Elasticsearch is running locally on the default port, so our curl command will take the form: "127.0.0.1:9200". If you’re running Elasticsearch on a different server, you’ll need to adjust the curl syntax accordingly: "YOURDOMAIN.com:9200".

curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/_search?pretty" -d '
> {
> "query":
> {
> "match_all": {}
> },
> "sort":
> [
> {
> "name.raw": "asc"
> }
> ]
> }
> '

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 14,
"max_score" : null,
"hits" : [
{
"_index" : "store",
"_type" : "products",
"_id" : "6",
"_score" : null,
"_source" : {
"id" : "6",
"name" : "0.5lb Jumbo Shrimp",
"price" : 5.29,
"quantity" : 12,
"department" : [
"Meat and Seafood"
]
},
"sort" : [
"0.5lb Jumbo Shrimp"
]
},
{
"name" : "1 Gallon Milk",
},
{
"name" : "1 Gallon Orange Juice",
},
{
"name" : "1 Gallon Soy Milk",
},
{
"name" : "1 Gallon Vanilla Soy Milk",
},
{
"name" : "12 Pack Cherry Cola",
},
{
"name" : "12 Pack Cola",
},
{
"name" : "1lb Ground Beef",
},
{
"name" : "Chocolate Bar",
},
{
"name" : "Dozen Apples",
}
]
}
}

The output you see above has been trimmed down a bit, but it’s still clear that the results have been sorted alphabetically. Keep in mind that the default number of results to return is 10, which is why not all results show up in the output; you can add the "size" parameter to your query to specify how many results you’d like to get back.

Conclusion

Analyzed fields in Elasticsearch allow for broader searching on partial matches, but they can make sorting a tricky task. Fortunately, it’s easy to solve this problem by creating a subfield that holds a copy of the original string. While it’s not possible to update the mapping on an existing index, it’s not difficult to recreate the index and simply re-import the data. With the step-by-step instructions in this tutorial, you should have no trouble sorting an analyzed text field in Elasticsearch.

Just the Code

If you’re already familiar with the concepts described in this tutorial, here’s all the code you need to sort an analyzed text field in Elasticsearch:

curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {
"products": {
"properties" : {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'
curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @demo_store_db.json
curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/_search?pretty" -d '
{
"query":
{
"match_all": {}
},
"sort":
[
{
"name.raw": "asc"
}
]
}
'

Pilot the ObjectRocket platform free for 30 Days

It's easy to get started. Imagine the time you'll save by not worrying about database management. Let's do this!

PILOT FREE FOR 30 DAYS

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.