When to use the keyword type vs text datatype in Elasticsearch
When you’re working with data in Elasticsearch, it’s important to understand your options for storing and handling string values. Elasticsearch has two core datatypes that can store string data:
keyword. It’s easy to get these two types confused, but this tutorial will help set the story straight. In this article, we’ll look at some important differences between these types and discuss when to use a keyword vs a text datatype in Elasticsearch.
Keyword vs Text – Full vs. Partial Matches
The primary difference between the
text datatype and the
keyword datatype is that
text fields are analyzed at the time of indexing, and
keyword fields are not. What that means is,
text fields are broken down into their individual terms at indexing to allow for partial matching, while
keyword fields are indexed as is. For example, a
text field containing the value “Roosters crow everyday” would get all of its individual components indexed: “Roosters”, “crow”, and “everyday”; a query on any of those terms would return this string. However, if the same string was stored as a
keyword type, it would not get broken down. Only a search for the exact string “Roosters crow everyday” would return it as a result. Because
text fields are analyzed in this way, one consequence is that they’re not able to be sorted alphabetically. A
keyword field, on the other hand, can be sorted alphabetically in the typical fashion.
Both of these datatypes can prove valuable depending on the situation. Let’s look at some common use cases for each:
Use Cases for
One useful application of the
text datatype is for product descriptions. Imagine a user searching for pajamas– chances are, they’ll simply use “pajamas” as their search terms, and you’ll want your results to give them all products that have the word “pajamas” somewhere in their description.
Use Cases for
keyword datatype can come in handy for cases where a user will be querying for exact matches. A good example would be a “state” field. A user will search for “North Carolina” but not for the word “North” by itself. Email addresses are also good candidates for the
keyword datatype for similar reasons.
Creating an Index with
Now that we’ve discussed the differences between the
keyword datatypes, let’s look at some sample code that will show how to create an index containing fields of these types. Our index will be called
"demo_index", and it will have two fields:
"state" field will have the
keyword datatype, and the
"product_description" field will have the
text datatypes haven’t always been part of Elasticsearch. Originally, Elasticsearch provided just a single
string datatype, and users could set an option called
index to either
not_analyzed in their mapping to specify whether they wanted a string to be broken down into its individual terms upon indexing or simply indexed as is. However, this construct sometimes led to confusion, as some options available for a
string type only made sense for one of the two use cases. For this reason, Elastic rolled out the
text datatypes when they released Elasticsearch 5.0. Any backward compatibility with the old
string datatype was removed with the release of Elasticsearch 6.0, so it would now be impossible to create an index that utilizes that datatype.
When you’re creating a new index in Elasticsearch, it’s important to understand your data and choose your datatypes with care. Before creating the mapping for an index, it’s helpful to know how users might be searching for data in a specific field; this is especially true when you’re dealing with string data where partial matching may be needed. With the explanations provided in this tutorial, you’ll know when to use a keyword vs a text datatype in Elasticsearch.