When to use the keyword type vs text datatype in Elasticsearch
Introduction
When you’re working with data in Elasticsearch, it’s important to understand your options for storing and handling string values. Elasticsearch has two core datatypes that can store string data: text
and keyword
. It’s easy to get these two types confused, but this tutorial will help set the story straight. In this article, we’ll look at some important differences between these types and discuss when to use a keyword vs a text datatype in Elasticsearch.
Keyword vs Text – Full vs. Partial Matches
The primary difference between the text
datatype and the keyword
datatype is that text
fields are analyzed at the time of indexing, and keyword
fields are not. What that means is, text
fields are broken down into their individual terms at indexing to allow for partial matching, while keyword
fields are indexed as is. For example, a text
field containing the value “Roosters crow everyday” would get all of its individual components indexed: “Roosters”, “crow”, and “everyday”; a query on any of those terms would return this string. However, if the same string was stored as a keyword
type, it would not get broken down. Only a search for the exact string “Roosters crow everyday” would return it as a result. Because text
fields are analyzed in this way, one consequence is that they’re not able to be sorted alphabetically. A keyword
field, on the other hand, can be sorted alphabetically in the typical fashion.
Both of these datatypes can prove valuable depending on the situation. Let’s look at some common use cases for each:
Use Cases for text
datatype
One useful application of the text
datatype is for product descriptions. Imagine a user searching for pajamas– chances are, they’ll simply use “pajamas” as their search terms, and you’ll want your results to give them all products that have the word “pajamas” somewhere in their description.
Use Cases for keyword
datatype
The keyword
datatype can come in handy for cases where a user will be querying for exact matches. A good example would be a “state” field. A user will search for “North Carolina” but not for the word “North” by itself. Email addresses are also good candidates for the keyword
datatype for similar reasons.
Creating an Index with text
and keyword
Datatypes
Now that we’ve discussed the differences between the text
and keyword
datatypes, let’s look at some sample code that will show how to create an index containing fields of these types. Our index will be called "demo_index"
, and it will have two fields: "state"
and "product_description"
. The "state"
field will have the keyword
datatype, and the "product_description"
field will have the text
datatype:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | PUT demo_index { "mappings": { "_doc": { "properties": { "state": { "type": "keyword" }, "product_description": { "type": "text" } } } } } |
The keyword
and text
datatypes haven’t always been part of Elasticsearch. Originally, Elasticsearch provided just a single string
datatype, and users could set an option called index
to either analyzed
or not_analyzed
in their mapping to specify whether they wanted a string to be broken down into its individual terms upon indexing or simply indexed as is. However, this construct sometimes led to confusion, as some options available for a string
type only made sense for one of the two use cases. For this reason, Elastic rolled out the keyword
and text
datatypes when they released Elasticsearch 5.0. Any backward compatibility with the old string
datatype was removed with the release of Elasticsearch 6.0, so it would now be impossible to create an index that utilizes that datatype.
Conclusion
When you’re creating a new index in Elasticsearch, it’s important to understand your data and choose your datatypes with care. Before creating the mapping for an index, it’s helpful to know how users might be searching for data in a specific field; this is especially true when you’re dealing with string data where partial matching may be needed. With the explanations provided in this tutorial, you’ll know when to use a keyword vs a text datatype in Elasticsearch.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started