How to Sort an Analyzed Text Field in Elasticsearch

Written by Data Pilot

April 07, 2019

Elasticsearch

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

In Elasticsearch, you can define how your string data is processed upon indexing by setting its type to either keyword or text. What’s the difference between these two types? When you store data in a field that has a keyword type, it’s indexed as is; however, data stored in a field with a text type is analyzed before indexing. To put it simply, the analyzer breaks down the value of a text field into its separate terms, making it easy to search for those individual terms. Let’s say we had a string: "Elasticsearch makes life easy". If we stored it as a keyword field, it would be indexed just as it is, but if we stored it as a text field, it would be broken down into its individual terms: “Elasticsearch”, “makes”, “life”, and “easy”.

While analyzing a text field before indexing can be helpful for searching because it allows for partial matching, it can make sorting a bit problematic. Is it possible to alphabetically sort the values of a text field by their original text strings? Fortunately, Elasticsearch makes this task simple to accomplish. In this tutorial, we’ll show you how to sort an analyzed text field in Elasticsearch. If you’re already familiar with basic sorting operations and prefer to dive into the sample code, feel free to skip to Just the Code.

Step 1: Create a subfield of type keyword

The main problem with sorting an analyzed text field is that we lose the original string representation. In our example, our first step will be to save an original copy of the string that we can use for sorting. We’ll do this by using a subfield that will store the original text.

We’re going to use a sample index for this example called store. Our store index will contain a type called products which lists all of the store’s products. We’ll keep our dataset simple by including just a handful of products with a small number of fields: id, price, quantity, and department. The JSON shown below can be used to create the dataset:

id	name	price	quantity	department
1	Multi-Grain Cereal	4.99	4	Packaged Foods
2	1lb Ground Beef	3.99	29	Meat and Seafood
3	Dozen Apples	2.49	12	Produce
4	Chocolate Bar	1.29	2	Packaged Foods, Checkout
5	1 Gallon Milk	3.29	16	Dairy
6	0.5lb Jumbo Shrimp	5.29	12	Meat and Seafood
7	Wheat Bread	1.29	5	Bakery
8	Pepperoni Pizza	2.99	5	Frozen
9	12 Pack Cola	5.29	6	Packaged Foods
10	Lime Juice	0.99	20	Produce
11	12 Pack Cherry Cola	5.59	5	Packaged Foods
12	1 Gallon Soy Milk	3.39	10	Dairy
13	1 Gallon Vanilla Soy Milk	3.49	9	Dairy
14	1 Gallon Orange Juice	3.29	4	Juice

The following code shows the mapping:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {
"products": {
"properties" : {
"name": { "type": "text"},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'

You can see in the mapping that the name field is of type text, which means values like “1 Gallon Vanilla Soy Milk” get analyzed and broken down into their individual terms: “1”, “Gallon”, “Vanilla”, “Soy”, and “Milk”. Unfortunately, this means we wouldn’t be able to sort the values alphabetically; to remedy this, we’ll create our subfield and store a copy of our original string in it. We’ll have to delete our index and recreate the mapping to accomplish this task, so we’ll also need to re-import the data.

Run this curl command to delete the index:

1	curl -H "Content-Type: application/json" -XDELETE 127.0.0.1:9200/store

Then run the following curl command to recreate the index with the updated mapping:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {
"products": {
"properties" : {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'

After that’s done, we’ll re-import our data:

1	curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @demo_store_db.json

Notice that our new subfield, named "raw", has the type "keyword". This is important, because values of type "keyword" are not analyzed and are indexed as is. We’ll take advantage of this and store a copy of the original string in this field.

Step 2: Sort by the subfield

Now that our new subfield is in place and we’ve re-imported our data, we can easily sort by the subfield. We’ll use the _search API to accomplish this task, specifying "name.raw" as the field to sort by. The curl command we use is shown below:

Note: In our example, we assume that Elasticsearch is running locally on the default port, so our curl command will take the form: "127.0.0.1:9200". If you’re running Elasticsearch on a different server, you’ll need to adjust the curl syntax accordingly: "YOURDOMAIN.com:9200".

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/_search?pretty" -d '
> {
> "query":
> {
> "match_all": {}
> },
> "sort":
> [
> {
> "name.raw": "asc"
> }
> ]
> }
> '
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 14,
"max_score" : null,
"hits" : [
{
"_index" : "store",
"_type" : "products",
"_id" : "6",
"_score" : null,
"_source" : {
"id" : "6",
"name" : "0.5lb Jumbo Shrimp",
"price" : 5.29,
"quantity" : 12,
"department" : [
"Meat and Seafood"
]
},
"sort" : [
"0.5lb Jumbo Shrimp"
]
},
{
"name" : "1 Gallon Milk",
},
{
"name" : "1 Gallon Orange Juice",
},
{
"name" : "1 Gallon Soy Milk",
},
{
"name" : "1 Gallon Vanilla Soy Milk",
},
{
"name" : "12 Pack Cherry Cola",
},
{
"name" : "12 Pack Cola",
},
{
"name" : "1lb Ground Beef",
},
{
"name" : "Chocolate Bar",
},
{
"name" : "Dozen Apples",
}
]
}
}

The output you see above has been trimmed down a bit, but it’s still clear that the results have been sorted alphabetically. Keep in mind that the default number of results to return is 10, which is why not all results show up in the output; you can add the "size" parameter to your query to specify how many results you’d like to get back.

Conclusion

Analyzed fields in Elasticsearch allow for broader searching on partial matches, but they can make sorting a tricky task. Fortunately, it’s easy to solve this problem by creating a subfield that holds a copy of the original string. While it’s not possible to update the mapping on an existing index, it’s not difficult to recreate the index and simply re-import the data. With the step-by-step instructions in this tutorial, you should have no trouble sorting an analyzed text field in Elasticsearch.

Just the Code

If you’re already familiar with the concepts described in this tutorial, here’s all the code you need to sort an analyzed text field in Elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

1	curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @demo_store_db.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14

curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/_search?pretty" -d '
{
"query":
{
"match_all": {}
},
"sort":
[
{
"name.raw": "asc"
}
]
}
'

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

How to Sort an Analyzed Text Field in Elasticsearch

Introduction

Step 1: Create a subfield of type keyword

Step 2: Sort by the subfield

Conclusion

Just the Code

Pilot the ObjectRocket Platform Free!

Keep in the know!

Services

Platform

Company

Resources

Support