How to Sort an Analyzed Text Field in Elasticsearch using NodeJS
Introduction
In Elasticsearch, you can define how your string data is processed upon indexing by setting its type to either keyword
or text
. What’s the difference between these two types? When you store data in a field that has a keyword
type, it’s indexed as is; however, data stored in a field with a text
type is analyzed before indexing. To put it simply, the analyzer breaks down the value of a text
field into its separate terms, making it easy to search for those individual terms. Let’s say we had a string: "Elasticsearch makes life easy"
. If we stored it as a keyword
field, it would be indexed just as it is, but if we stored it as a text
field, it would be broken down into its individual terms: “Elasticsearch”, “makes”, “life”, and “easy”.
While analyzing a text
field before indexing can be helpful for searching because it allows for partial matching, it can make sorting a bit problematic. Is it possible to alphabetically sort the values of a text
field by their original text strings? Fortunately, Elasticsearch makes this task simple to accomplish. In this tutorial, we’ll show you how to sort an analyzed text field in Elasticsearch using NodeJS (Javscript). If you’re already familiar with basic sorting operations and prefer to dive into the sample code, feel free to skip to Just the Code.
Step 1: Create a subfield of type keyword
The main problem with sorting an analyzed text
field is that we lose the original string representation. In our example, our first step will be to save an original copy of the string that we can use for sorting. We’ll do this by using a subfield that will store the original text.
Let’s look at an example that uses an index called store
, which represents a small grocery store. This store
index contains a type called products
which lists the store’s products. To keep things simple, our example dataset will only contain a handful of products with just the following fields: id, price, quantity, and department. The code below shows the JSON used to create the dataset:
id | name | price | quantity | department |
---|---|---|---|---|
1 | Multi-Grain Cereal | 4.99 | 4 | Packaged Foods |
2 | 1lb Ground Beef | 3.99 | 29 | Meat and Seafood |
3 | Dozen Apples | 2.49 | 12 | Produce |
4 | Chocolate Bar | 1.29 | 2 | Packaged Foods, Checkout |
5 | 1 Gallon Milk | 3.29 | 16 | Dairy |
6 | 0.5lb Jumbo Shrimp | 5.29 | 12 | Meat and Seafood |
7 | Wheat Bread | 1.29 | 5 | Bakery |
8 | Pepperoni Pizza | 2.99 | 5 | Frozen |
9 | 12 Pack Cola | 5.29 | 6 | Packaged Foods |
10 | Lime Juice | 0.99 | 20 | Produce |
11 | 12 Pack Cherry Cola | 5.59 | 5 | Packaged Foods |
12 | 1 Gallon Soy Milk | 3.39 | 10 | Dairy |
13 | 1 Gallon Vanilla Soy Milk | 3.49 | 9 | Dairy |
14 | 1 Gallon Orange Juice | 3.29 | 4 | Juice |
Here is the json we used to define the mapping if our index:
1 2 3 4 5 6 7 8 9 10 11 12 | { "mappings": { "products": { "properties" : { "name": { "type": "text"}, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } |
You can see in the mapping that the name
field is of type text
, which means values like “1 Gallon Vanilla Soy Milk” get analyzed and broken down into their individual terms: “1”, “Gallon”, “Vanilla”, “Soy”, and “Milk”. Unfortunately, this means we wouldn’t be able to sort the values alphabetically; to remedy this, we’ll create our subfield and store a copy of our original string in it. We’ll have to delete our index and recreate the mapping to accomplish this task, so we’ll also need to re-import the data.
Here is some sample javascript code to delete the index.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | var elasticsearch = require("elasticsearch"); var client = new elasticsearch.Client({ hosts: ["http://localhost:9200"] }); client.indices.delete({ index: 'store', }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message); }); |
And here is the Javascript we used to recreate the index with the new mapping containing the original string.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | /* Create index mapping */ client.indices.create({ index: "store", body: { "mappings": { "products": { "properties" : { "name": { "type": "text", "fields": { "raw": { "type": "keyword" } } }, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message); }); |
Notice that our new subfield, named "raw"
, has the type "keyword"
. This is important, because values of type "keyword"
are not analyzed and are indexed as is. We’ll take advantage of this and store a copy of the original string in this field.
Then we re-import our data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | /* Bulk Import from JSON */ client.bulk({ body: [ { "create" : { "_index" : "store", "_type" : "products", "_id" : "1" } }, { "id": "1", "name" : "Multi-Grain Cereal", "price": 4.99, "quantity": 4 , "department":["Packaged Foods"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "2" } }, { "id": "2", "name" : "1lb Ground Beef", "price": 3.99, "quantity": 29 , "department":["Meat and Seafood"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "3" } }, { "id": "3", "name" : "Dozen Apples", "price": 2.49, "quantity": 12 , "department":["Produce"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "4" } }, { "id": "4", "name" : "Chocolate Bar", "price": 1.29, "quantity": 2 , "department":["Packaged Foods", "Checkout"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "5" } }, { "id": "5", "name" : "1 Gallon Milk", "price": 3.29, "quantity": 16 , "department":["Dairy"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "6" } }, { "id": "6", "name" : "0.5lb Jumbo Shrimp", "price": 5.29, "quantity": 12 , "department":["Meat and Seafood"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "7" } }, { "id": "7", "name" : "Wheat Bread", "price": 1.29, "quantity": 5 , "department":["Bakery"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "8" } }, { "id": "8", "name" : "Pepperoni Pizza", "price": 2.99, "quantity": 5 , "department":["Frozen"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "9" } }, { "id": "9", "name" : "12 Pack Cola", "price": 5.29, "quantity": 6 , "department":["Packaged Foods"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "10" } }, { "id": "10", "name" : "Lime Juice", "price": 0.99, "quantity": 20 , "department":["Produce"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "11" } }, { "id": "11", "name" : "12 Pack Cherry Cola", "price": 5.59, "quantity": 5 , "department":["Packaged Foods"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "12" } }, { "id": "12", "name" : "1 Gallon Soy Milk", "price": 3.39, "quantity": 10 , "department":["Dairy"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "13" } }, { "id": "13", "name" : "1 Gallon Vanilla Soy Milk", "price": 3.49, "quantity": 9 , "department":["Dairy"] }, { "create" : { "_index" : "store", "_type" : "products", "_id" : "14" } }, { "id": "14", "name" : "1 Gallon Orange Juice", "price": 3.29, "quantity": 4 , "department":["Juice"] } ] }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message); }); |
Step 2: Sort by the subfield
Now that our new subfield is in place and we’ve re-imported our data, we can easily sort by the subfield. We’ll use the _search API to accomplish this task, specifying "name.raw"
as the field to sort by. The Javascript we we use to sort is show below:
Note: In our example, we assume that Elasticsearch is running locally on the default port
"http://localhost:9200"
. If you’re running Elasticsearch on a different server, you’ll need to adjust the syntax accordingly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | /* Sort by Raw Text Field */ client.search({ size: 20, index: 'store', type: 'products', body: { query: { match_all: {} }, sort: [{"name.raw": "asc"}] } }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message);`` }); |
Response:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | Successful query! { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 14, "max_score": null, "hits": [ { "_index": "store", "_type": "products", "_id": "6", "_score": null, "_source": { "id": "6", "name": "0.5lb Jumbo Shrimp", "price": 5.29, "quantity": 12, "department": [ "Meat and Seafood" ] }, "sort": [ "0.5lb Jumbo Shrimp" ] }, { "_index": "store", "_type": "products", "_id": "5", "_score": null, "_source": { "id": "5", "name": "1 Gallon Milk", "price": 3.29, "quantity": 16, "department": [ "Dairy" ] }, "sort": [ "1 Gallon Milk" ] }, { "_index": "store", "_type": "products", "_id": "14", "_score": null, "_source": { "id": "14", "name": "1 Gallon Orange Juice", "price": 3.29, "quantity": 4, "department": [ "Juice" ] }, "sort": [ "1 Gallon Orange Juice" ] }, { "_index": "store", "_type": "products", "_id": "12", "_score": null, "_source": { "id": "12", "name": "1 Gallon Soy Milk", "price": 3.39, "quantity": 10, "department": [ "Dairy" ] }, "sort": [ "1 Gallon Soy Milk" ] }, { "_index": "store", "_type": "products", "_id": "13", "_score": null, "_source": { "id": "13", "name": "1 Gallon Vanilla Soy Milk", "price": 3.49, "quantity": 9, "department": [ "Dairy" ] }, "sort": [ "1 Gallon Vanilla Soy Milk" ] }, { "_index": "store", "_type": "products", "_id": "11", "_score": null, "_source": { "id": "11", "name": "12 Pack Cherry Cola", "price": 5.59, "quantity": 5, "department": [ "Packaged Foods" ] }, "sort": [ "12 Pack Cherry Cola" ] }, { "_index": "store", "_type": "products", "_id": "9", "_score": null, "_source": { "id": "9", "name": "12 Pack Cola", "price": 5.29, "quantity": 6, "department": [ "Packaged Foods" ] }, "sort": [ "12 Pack Cola" ] }, { "_index": "store", "_type": "products", "_id": "2", "_score": null, "_source": { "id": "2", "name": "1lb Ground Beef", "price": 3.99, "quantity": 29, "department": [ "Meat and Seafood" ] }, "sort": [ "1lb Ground Beef" ] }, { "_index": "store", "_type": "products", "_id": "4", "_score": null, "_source": { "id": "4", "name": "Chocolate Bar", "price": 1.29, "quantity": 2, "department": [ "Packaged Foods", "Checkout" ] }, "sort": [ "Chocolate Bar" ] }, { "_index": "store", "_type": "products", "_id": "3", "_score": null, "_source": { "id": "3", "name": "Dozen Apples", "price": 2.49, "quantity": 12, "department": [ "Produce" ] }, "sort": [ "Dozen Apples" ] }, { "_index": "store", "_type": "products", "_id": "10", "_score": null, "_source": { "id": "10", "name": "Lime Juice", "price": 0.99, "quantity": 20, "department": [ "Produce" ] }, "sort": [ "Lime Juice" ] }, { "_index": "store", "_type": "products", "_id": "1", "_score": null, "_source": { "id": "1", "name": "Multi-Grain Cereal", "price": 4.99, "quantity": 4, "department": [ "Packaged Foods" ] }, "sort": [ "Multi-Grain Cereal" ] }, { "_index": "store", "_type": "products", "_id": "8", "_score": null, "_source": { "id": "8", "name": "Pepperoni Pizza", "price": 2.99, "quantity": 5, "department": [ "Frozen" ] }, "sort": [ "Pepperoni Pizza" ] }, { "_index": "store", "_type": "products", "_id": "7", "_score": null, "_source": { "id": "7", "name": "Wheat Bread", "price": 1.29, "quantity": 5, "department": [ "Bakery" ] }, "sort": [ "Wheat Bread" ] } ] } } |
It’s still clear that the results have been sorted alphabetically by the raw name subfield we created.
Conclusion
Analyzed fields in Elasticsearch allow for broader searching on partial matches, but they can make sorting a tricky task. Fortunately, it’s easy to solve this problem by creating a subfield that holds a copy of the original string. While it’s not possible to update the mapping on an existing index, it’s not difficult to recreate the index and simply re-import the data. With the step-by-step instructions in this tutorial, you should have no trouble sorting an analyzed text field in Elasticsearch.
Just the Code
If you’re already familiar with the concepts described in this tutorial, here’s all the code you need to sort an analyzed text field in Elasticsearch:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | /* Create index mapping */ client.indices.create({ index: "store", body: { "mappings": { "products": { "properties" : { "name": { "type": "text", "fields": { "raw": { "type": "keyword" } } }, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message); }); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | /* Sort by Raw Text Field */ client.search({ size: 20, index: 'store', type: 'products', body: { query: { match_all: {} }, sort: [{"name.raw": "asc"}] } }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message);`` }); |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started