How to Sort an Analyzed Text Field in Elasticsearch using NodeJS

Introduction

In Elasticsearch, you can define how your string data is processed upon indexing by setting its type to either keyword or text. What’s the difference between these two types? When you store data in a field that has a keyword type, it’s indexed as is; however, data stored in a field with a text type is analyzed before indexing. To put it simply, the analyzer breaks down the value of a text field into its separate terms, making it easy to search for those individual terms. Let’s say we had a string: "Elasticsearch makes life easy". If we stored it as a keyword field, it would be indexed just as it is, but if we stored it as a text field, it would be broken down into its individual terms: “Elasticsearch”, “makes”, “life”, and “easy”.

While analyzing a text field before indexing can be helpful for searching because it allows for partial matching, it can make sorting a bit problematic. Is it possible to alphabetically sort the values of a text field by their original text strings? Fortunately, Elasticsearch makes this task simple to accomplish. In this tutorial, we’ll show you how to sort an analyzed text field in Elasticsearch using NodeJS (Javscript). If you’re already familiar with basic sorting operations and prefer to dive into the sample code, feel free to skip to Just the Code.

Step 1: Create a subfield of type keyword

The main problem with sorting an analyzed text field is that we lose the original string representation. In our example, our first step will be to save an original copy of the string that we can use for sorting. We’ll do this by using a subfield that will store the original text.

Let’s look at an example that uses an index called store, which represents a small grocery store. This store index contains a type called products which lists the store’s products. To keep things simple, our example dataset will only contain a handful of products with just the following fields: id, price, quantity, and department. The code below shows the JSON used to create the dataset:

idnamepricequantitydepartment
1Multi-Grain Cereal4.994Packaged Foods
21lb Ground Beef3.9929Meat and Seafood
3Dozen Apples2.4912Produce
4Chocolate Bar1.292Packaged Foods, Checkout
51 Gallon Milk3.2916Dairy
60.5lb Jumbo Shrimp5.2912Meat and Seafood
7Wheat Bread1.295Bakery
8Pepperoni Pizza2.995Frozen
912 Pack Cola5.296Packaged Foods
10Lime Juice0.9920Produce
1112 Pack Cherry Cola5.595Packaged Foods
121 Gallon Soy Milk3.3910Dairy
131 Gallon Vanilla Soy Milk3.499Dairy
141 Gallon Orange Juice3.294Juice

Here is the json we used to define the mapping if our index:

{
    "mappings": {
        "products": {
            "properties" : {
                "name": { "type": "text"},
                "price": { "type": "double"},
                "quantity": { "type": "integer"},
                "department": { "type": "keyword"}
            }
        }
    }
}

You can see in the mapping that the name field is of type text, which means values like “1 Gallon Vanilla Soy Milk” get analyzed and broken down into their individual terms: “1”, “Gallon”, “Vanilla”, “Soy”, and “Milk”. Unfortunately, this means we wouldn’t be able to sort the values alphabetically; to remedy this, we’ll create our subfield and store a copy of our original string in it. We’ll have to delete our index and recreate the mapping to accomplish this task, so we’ll also need to re-import the data.

Here is some sample javascript code to delete the index.

var elasticsearch = require("elasticsearch");

var client = new elasticsearch.Client({
  hosts: ["http://localhost:9200"]
});

client.indices.delete({
  index: 'store',
}).then(function(resp) {
  console.log("Successful query!");
  console.log(JSON.stringify(resp, null, 4));
}, function(err) {
  console.trace(err.message);
});

And here is the Javascript we used to recreate the index with the new mapping containing the original string.

/* Create index mapping */
client.indices.create({
    index: "store",
    body: {
        "mappings": {
          "products": {
            "properties" : {
              "name": {
                "type": "text",
                "fields": {
                  "raw": {
                    "type": "keyword"
                  }
                }
              },
              "price": { "type": "double"},
              "quantity": { "type": "integer"},
              "department": { "type": "keyword"}
            }
          }
        }
    }
}).then(function(resp) {
  console.log("Successful query!");
  console.log(JSON.stringify(resp, null, 4));
}, function(err) {
  console.trace(err.message);
});

Notice that our new subfield, named "raw", has the type "keyword". This is important, because values of type "keyword" are not analyzed and are indexed as is. We’ll take advantage of this and store a copy of the original string in this field.

Then we re-import our data:

/* Bulk Import from JSON */
client.bulk({
  body: [
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "1" } },
    { "id": "1", "name" : "Multi-Grain Cereal", "price": 4.99, "quantity": 4 , "department":["Packaged Foods"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "2" } },
    { "id": "2", "name" : "1lb Ground Beef",  "price": 3.99, "quantity": 29 , "department":["Meat and Seafood"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "3" } },
    { "id": "3", "name" : "Dozen Apples",  "price": 2.49, "quantity": 12 , "department":["Produce"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "4" } },
    { "id": "4", "name" : "Chocolate Bar",  "price": 1.29, "quantity": 2 , "department":["Packaged Foods", "Checkout"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "5" } },
    { "id": "5", "name" : "1 Gallon Milk",  "price": 3.29, "quantity": 16 , "department":["Dairy"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "6" } },
    { "id": "6", "name" : "0.5lb Jumbo Shrimp",  "price": 5.29, "quantity": 12 , "department":["Meat and Seafood"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "7" } },
    { "id": "7", "name" : "Wheat Bread",  "price": 1.29, "quantity": 5 , "department":["Bakery"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "8" } },
    { "id": "8", "name" : "Pepperoni Pizza",  "price": 2.99, "quantity": 5 , "department":["Frozen"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "9" } },
    { "id": "9", "name" : "12 Pack Cola",  "price": 5.29, "quantity": 6 , "department":["Packaged Foods"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "10" } },
    { "id": "10", "name" : "Lime Juice",  "price": 0.99, "quantity": 20 , "department":["Produce"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "11" } },
    { "id": "11", "name" : "12 Pack Cherry Cola",  "price": 5.59, "quantity": 5 , "department":["Packaged Foods"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "12" } },
    { "id": "12", "name" : "1 Gallon Soy Milk",  "price": 3.39, "quantity": 10 , "department":["Dairy"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "13" } },
    { "id": "13", "name" : "1 Gallon Vanilla Soy Milk",  "price": 3.49, "quantity": 9 , "department":["Dairy"] },
    { "create" : { "_index" : "store", "_type" : "products", "_id" : "14" } },
    { "id": "14", "name" : "1 Gallon Orange Juice",  "price": 3.29, "quantity": 4 , "department":["Juice"] }
  ]
}).then(function(resp) {
    console.log("Successful query!");
    console.log(JSON.stringify(resp, null, 4));
  }, function(err) {
    console.trace(err.message);
  });

Step 2: Sort by the subfield

Now that our new subfield is in place and we’ve re-imported our data, we can easily sort by the subfield. We’ll use the _search API to accomplish this task, specifying "name.raw" as the field to sort by. The Javascript we we use to sort is show below:

Note: In our example, we assume that Elasticsearch is running locally on the default port"http://localhost:9200". If you’re running Elasticsearch on a different server, you’ll need to adjust the syntax accordingly.

/* Sort by Raw Text Field */
client.search({
  size: 20,
  index: 'store',
  type: 'products',
  body: {
        query: {
            match_all: {}
        },
        sort: [{"name.raw": "asc"}]

    }
}).then(function(resp) {
  console.log("Successful query!");
  console.log(JSON.stringify(resp, null, 4));
}, function(err) {
  console.trace(err.message);``
});

Response:

Successful query!
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 14,
        "max_score": null,
        "hits": [
            {
                "_index": "store",
                "_type": "products",
                "_id": "6",
                "_score": null,
                "_source": {
                    "id": "6",
                    "name": "0.5lb Jumbo Shrimp",
                    "price": 5.29,
                    "quantity": 12,
                    "department": [
                        "Meat and Seafood"
                    ]
                },
                "sort": [
                    "0.5lb Jumbo Shrimp"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "5",
                "_score": null,
                "_source": {
                    "id": "5",
                    "name": "1 Gallon Milk",
                    "price": 3.29,
                    "quantity": 16,
                    "department": [
                        "Dairy"
                    ]
                },
                "sort": [
                    "1 Gallon Milk"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "14",
                "_score": null,
                "_source": {
                    "id": "14",
                    "name": "1 Gallon Orange Juice",
                    "price": 3.29,
                    "quantity": 4,
                    "department": [
                        "Juice"
                    ]
                },
                "sort": [
                    "1 Gallon Orange Juice"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "12",
                "_score": null,
                "_source": {
                    "id": "12",
                    "name": "1 Gallon Soy Milk",
                    "price": 3.39,
                    "quantity": 10,
                    "department": [
                        "Dairy"
                    ]
                },
                "sort": [
                    "1 Gallon Soy Milk"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "13",
                "_score": null,
                "_source": {
                    "id": "13",
                    "name": "1 Gallon Vanilla Soy Milk",
                    "price": 3.49,
                    "quantity": 9,
                    "department": [
                        "Dairy"
                    ]
                },
                "sort": [
                    "1 Gallon Vanilla Soy Milk"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "11",
                "_score": null,
                "_source": {
                    "id": "11",
                    "name": "12 Pack Cherry Cola",
                    "price": 5.59,
                    "quantity": 5,
                    "department": [
                        "Packaged Foods"
                    ]
                },
                "sort": [
                    "12 Pack Cherry Cola"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "9",
                "_score": null,
                "_source": {
                    "id": "9",
                    "name": "12 Pack Cola",
                    "price": 5.29,
                    "quantity": 6,
                    "department": [
                        "Packaged Foods"
                    ]
                },
                "sort": [
                    "12 Pack Cola"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "2",
                "_score": null,
                "_source": {
                    "id": "2",
                    "name": "1lb Ground Beef",
                    "price": 3.99,
                    "quantity": 29,
                    "department": [
                        "Meat and Seafood"
                    ]
                },
                "sort": [
                    "1lb Ground Beef"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "4",
                "_score": null,
                "_source": {
                    "id": "4",
                    "name": "Chocolate Bar",
                    "price": 1.29,
                    "quantity": 2,
                    "department": [
                        "Packaged Foods",
                        "Checkout"
                    ]
                },
                "sort": [
                    "Chocolate Bar"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "3",
                "_score": null,
                "_source": {
                    "id": "3",
                    "name": "Dozen Apples",
                    "price": 2.49,
                    "quantity": 12,
                    "department": [
                        "Produce"
                    ]
                },
                "sort": [
                    "Dozen Apples"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "10",
                "_score": null,
                "_source": {
                    "id": "10",
                    "name": "Lime Juice",
                    "price": 0.99,
                    "quantity": 20,
                    "department": [
                        "Produce"
                    ]
                },
                "sort": [
                    "Lime Juice"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "1",
                "_score": null,
                "_source": {
                    "id": "1",
                    "name": "Multi-Grain Cereal",
                    "price": 4.99,
                    "quantity": 4,
                    "department": [
                        "Packaged Foods"
                    ]
                },
                "sort": [
                    "Multi-Grain Cereal"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "8",
                "_score": null,
                "_source": {
                    "id": "8",
                    "name": "Pepperoni Pizza",
                    "price": 2.99,
                    "quantity": 5,
                    "department": [
                        "Frozen"
                    ]
                },
                "sort": [
                    "Pepperoni Pizza"
                ]
            },
            {
                "_index": "store",
                "_type": "products",
                "_id": "7",
                "_score": null,
                "_source": {
                    "id": "7",
                    "name": "Wheat Bread",
                    "price": 1.29,
                    "quantity": 5,
                    "department": [
                        "Bakery"
                    ]
                },
                "sort": [
                    "Wheat Bread"
                ]
            }
        ]
    }
}

It’s still clear that the results have been sorted alphabetically by the raw name subfield we created.

Conclusion

Analyzed fields in Elasticsearch allow for broader searching on partial matches, but they can make sorting a tricky task. Fortunately, it’s easy to solve this problem by creating a subfield that holds a copy of the original string. While it’s not possible to update the mapping on an existing index, it’s not difficult to recreate the index and simply re-import the data. With the step-by-step instructions in this tutorial, you should have no trouble sorting an analyzed text field in Elasticsearch.

Just the Code

If you’re already familiar with the concepts described in this tutorial, here’s all the code you need to sort an analyzed text field in Elasticsearch:

/* Create index mapping */
client.indices.create({
    index: "store",
    body: {
        "mappings": {
          "products": {
            "properties" : {
              "name": {
                "type": "text",
                "fields": {
                  "raw": {
                    "type": "keyword"
                  }
                }
              },
              "price": { "type": "double"},
              "quantity": { "type": "integer"},
              "department": { "type": "keyword"}
            }
          }
        }
    }
}).then(function(resp) {
  console.log("Successful query!");
  console.log(JSON.stringify(resp, null, 4));
}, function(err) {
  console.trace(err.message);
});
/* Sort by Raw Text Field */
client.search({
  size: 20,
  index: 'store',
  type: 'products',
  body: {
        query: {
            match_all: {}
        },
        sort: [{"name.raw": "asc"}]

    }
}).then(function(resp) {
  console.log("Successful query!");
  console.log(JSON.stringify(resp, null, 4));
}, function(err) {
  console.trace(err.message);``
});

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.