How to Sort an Analyzed Text Field in Elasticsearch
Introduction
In Elasticsearch, you can define how your string data is processed upon indexing by setting its type to either keyword
or text
. What’s the difference between these two types? When you store data in a field that has a keyword
type, it’s indexed as is; however, data stored in a field with a text
type is analyzed before indexing. To put it simply, the analyzer breaks down the value of a text
field into its separate terms, making it easy to search for those individual terms. Let’s say we had a string: "Elasticsearch makes life easy"
. If we stored it as a keyword
field, it would be indexed just as it is, but if we stored it as a text
field, it would be broken down into its individual terms: “Elasticsearch”, “makes”, “life”, and “easy”.
While analyzing a text
field before indexing can be helpful for searching because it allows for partial matching, it can make sorting a bit problematic. Is it possible to alphabetically sort the values of a text
field by their original text strings? Fortunately, Elasticsearch makes this task simple to accomplish. In this tutorial, we’ll show you how to sort an analyzed text field in Elasticsearch. If you’re already familiar with basic sorting operations and prefer to dive into the sample code, feel free to skip to Just the Code.
Step 1: Create a subfield of type keyword
The main problem with sorting an analyzed text
field is that we lose the original string representation. In our example, our first step will be to save an original copy of the string that we can use for sorting. We’ll do this by using a subfield that will store the original text.
We’re going to use a sample index for this example called store
. Our store
index will contain a type called products
which lists all of the store’s products. We’ll keep our dataset simple by including just a handful of products with a small number of fields: id, price, quantity, and department. The JSON shown below can be used to create the dataset:
id | name | price | quantity | department |
---|---|---|---|---|
1 | Multi-Grain Cereal | 4.99 | 4 | Packaged Foods |
2 | 1lb Ground Beef | 3.99 | 29 | Meat and Seafood |
3 | Dozen Apples | 2.49 | 12 | Produce |
4 | Chocolate Bar | 1.29 | 2 | Packaged Foods, Checkout |
5 | 1 Gallon Milk | 3.29 | 16 | Dairy |
6 | 0.5lb Jumbo Shrimp | 5.29 | 12 | Meat and Seafood |
7 | Wheat Bread | 1.29 | 5 | Bakery |
8 | Pepperoni Pizza | 2.99 | 5 | Frozen |
9 | 12 Pack Cola | 5.29 | 6 | Packaged Foods |
10 | Lime Juice | 0.99 | 20 | Produce |
11 | 12 Pack Cherry Cola | 5.59 | 5 | Packaged Foods |
12 | 1 Gallon Soy Milk | 3.39 | 10 | Dairy |
13 | 1 Gallon Vanilla Soy Milk | 3.49 | 9 | Dairy |
14 | 1 Gallon Orange Juice | 3.29 | 4 | Juice |
The following code shows the mapping:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d ' { "mappings": { "products": { "properties" : { "name": { "type": "text"}, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } ' |
You can see in the mapping that the name
field is of type text
, which means values like “1 Gallon Vanilla Soy Milk” get analyzed and broken down into their individual terms: “1”, “Gallon”, “Vanilla”, “Soy”, and “Milk”. Unfortunately, this means we wouldn’t be able to sort the values alphabetically; to remedy this, we’ll create our subfield and store a copy of our original string in it. We’ll have to delete our index and recreate the mapping to accomplish this task, so we’ll also need to re-import the data.
Run this curl
command to delete the index:
1 | curl -H "Content-Type: application/json" -XDELETE 127.0.0.1:9200/store |
Then run the following curl
command to recreate the index with the updated mapping:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d ' { "mappings": { "products": { "properties" : { "name": { "type": "text", "fields": { "raw": { "type": "keyword" } } }, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } ' |
After that’s done, we’ll re-import our data:
1 | curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @demo_store_db.json |
Notice that our new subfield, named "raw"
, has the type "keyword"
. This is important, because values of type "keyword"
are not analyzed and are indexed as is. We’ll take advantage of this and store a copy of the original string in this field.
Step 2: Sort by the subfield
Now that our new subfield is in place and we’ve re-imported our data, we can easily sort by the subfield. We’ll use the _search API to accomplish this task, specifying "name.raw"
as the field to sort by. The curl
command we use is shown below:
Note: In our example, we assume that Elasticsearch is running locally on the default port, so our
curl
command will take the form:"127.0.0.1:9200"
. If you’re running Elasticsearch on a different server, you’ll need to adjust thecurl
syntax accordingly:"YOURDOMAIN.com:9200"
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/_search?pretty" -d ' > { > "query": > { > "match_all": {} > }, > "sort": > [ > { > "name.raw": "asc" > } > ] > } > ' { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 14, "max_score" : null, "hits" : [ { "_index" : "store", "_type" : "products", "_id" : "6", "_score" : null, "_source" : { "id" : "6", "name" : "0.5lb Jumbo Shrimp", "price" : 5.29, "quantity" : 12, "department" : [ "Meat and Seafood" ] }, "sort" : [ "0.5lb Jumbo Shrimp" ] }, { "name" : "1 Gallon Milk", }, { "name" : "1 Gallon Orange Juice", }, { "name" : "1 Gallon Soy Milk", }, { "name" : "1 Gallon Vanilla Soy Milk", }, { "name" : "12 Pack Cherry Cola", }, { "name" : "12 Pack Cola", }, { "name" : "1lb Ground Beef", }, { "name" : "Chocolate Bar", }, { "name" : "Dozen Apples", } ] } } |
The output you see above has been trimmed down a bit, but it’s still clear that the results have been sorted alphabetically. Keep in mind that the default number of results to return is 10, which is why not all results show up in the output; you can add the "size"
parameter to your query to specify how many results you’d like to get back.
Conclusion
Analyzed fields in Elasticsearch allow for broader searching on partial matches, but they can make sorting a tricky task. Fortunately, it’s easy to solve this problem by creating a subfield that holds a copy of the original string. While it’s not possible to update the mapping on an existing index, it’s not difficult to recreate the index and simply re-import the data. With the step-by-step instructions in this tutorial, you should have no trouble sorting an analyzed text field in Elasticsearch.
Just the Code
If you’re already familiar with the concepts described in this tutorial, here’s all the code you need to sort an analyzed text field in Elasticsearch:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d ' { "mappings": { "products": { "properties" : { "name": { "type": "text", "fields": { "raw": { "type": "keyword" } } }, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } ' |
1 | curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @demo_store_db.json |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/_search?pretty" -d ' { "query": { "match_all": {} }, "sort": [ { "name.raw": "asc" } ] } ' |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started