How to Get Unique Values for a Field in Elasticsearch
Introduction
No matter what type of data you’re working with in Elasticsearch, there will probably be times when you want to find all the unique values for a given field. For example, let’s say you’re maintaining an index of all the artists your users listen to, along with the genre for each of these artists. You may want to compile a list of the genres each user listens to, but you don’t want your results to list every artist as well. Fortunately, finding these unique values in Elasticsearch is an easy task. In this tutorial, you’ll learn how to use aggregation to get the unique values for a field in Elasticsearch. If you’re already comfortable with the concept of aggregation and would prefer to skip the explanation, feel free to jump to Just the Code.
Aggregations by Example
Aggregation is a a powerful tool in Elasticsearch that allows you to calculate a field’s minimum, maximum, average, and much more; for now, we’re going to focus on its ability to determine unique values for a field. Let’s look at an example of how you can get the unique values for a field in Elasticsearch. For this example, we will use an index named store
, which represents a small grocery store. Our store
index will have a type named products
which lists the store’s products. We’ll keep our dataset simple by only including a handful of products with just a small number of fields: id, price, quantity, and department. The following table shows the data we have in our index:
id | name | price | quantity | department |
---|---|---|---|---|
1 | Multi-Grain Cereal | 4.99 | 4 | Packaged Foods |
2 | 1lb Ground Beef | 3.99 | 29 | Meat and Seafood |
3 | Dozen Apples | 2.49 | 12 | Produce |
4 | Chocolate Bar | 1.29 | 2 | Packaged Foods, Checkout |
5 | 1 Gallon Milk | 3.29 | 16 | Dairy |
6 | 0.5lb Jumbo Shrimp | 5.29 | 12 | Meat and Seafood |
7 | Wheat Bread | 1.29 | 5 | Bakery |
8 | Pepperoni Pizza | 2.99 | 5 | Frozen |
9 | 12 Pack Cola | 5.29 | 6 | Packaged Foods |
10 | Lime Juice | 0.99 | 20 | Produce |
11 | 12 Pack Cherry Cola | 5.59 | 5 | Packaged Foods |
12 | 1 Gallon Soy Milk | 3.39 | 10 | Dairy |
13 | 1 Gallon Vanilla Soy Milk | 3.49 | 9 | Dairy |
14 | 1 Gallon Orange Juice | 3.29 | 4 | Juice |
The curl command to create the mapping would look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d ' { "mappings": { "products": { "properties" : { "name": { "type": "text"}, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } ' |
Now that we’ve set up our dataset and mapping, we can use aggregation to find all the unique departments in the store. Let’s look at the code below to see how it’s done. If the code seems a bit complex, don’t worry– the explanations that follow will clarify what’s going on:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d ' { "aggs": { "departments": { "terms": { "field": "department" } } } } ' { "took" : 80, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 14, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "departments" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Packaged Foods", "doc_count" : 4 }, { "key" : "Dairy", "doc_count" : 3 }, { "key" : "Meat and Seafood", "doc_count" : 2 }, { "key" : "Produce", "doc_count" : 2 }, { "key" : "Bakery", "doc_count" : 1 }, { "key" : "Checkout", "doc_count" : 1 }, { "key" : "Frozen", "doc_count" : 1 }, { "key" : "Juice", "doc_count" : 1 } ] } } } |
If you look at the results, you can think of the products for each department
as being sorted into “buckets”, with a bucket for each department
. Each bucket contains a doc_count
which tells us how many products were in that department
.
How did we make this happen? We accomplished this by creating an aggregation using aggs
. In our example, we named the aggregation departments
and specified the field we wanted to aggregate by: department
. Note that the cURL contains the parameter size=0
. If we omitted this parameter, the query would return each product instead of the aggregate information we want.
Conclusion
There are many situations where you may want to find the unique values for a field in Elasticsearch– without the right tools, it can be difficult to wade through a large dataset. This tutorial explained how easy it is to use aggregation to accomplish this task. With this step-by-step guide, you can harness the power of aggregation to gain new insights into your data.
Just the Code
If you’re already familiar with the concept of aggregration, here’s all the code you’ll need to find the unique values of a field in Elasticsearch:
1 2 3 4 5 6 7 8 9 10 11 | curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d ' { "aggs": { "departments": { "terms": { "field": "department" } } } } ' |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started