How to Get Unique Values for a Field in Elasticsearch

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

No matter what type of data you’re working with in Elasticsearch, there will probably be times when you want to find all the unique values for a given field. For example, let’s say you’re maintaining an index of all the artists your users listen to, along with the genre for each of these artists. You may want to compile a list of the genres each user listens to, but you don’t want your results to list every artist as well. Fortunately, finding these unique values in Elasticsearch is an easy task. In this tutorial, you’ll learn how to use aggregation to get the unique values for a field in Elasticsearch. If you’re already comfortable with the concept of aggregation and would prefer to skip the explanation, feel free to jump to Just the Code.

Aggregations by Example

Aggregation is a a powerful tool in Elasticsearch that allows you to calculate a field’s minimum, maximum, average, and much more; for now, we’re going to focus on its ability to determine unique values for a field. Let’s look at an example of how you can get the unique values for a field in Elasticsearch. For this example, we will use an index named store, which represents a small grocery store. Our store index will have a type named products which lists the store’s products. We’ll keep our dataset simple by only including a handful of products with just a small number of fields: id, price, quantity, and department. The following table shows the data we have in our index:

idnamepricequantitydepartment
1Multi-Grain Cereal4.994Packaged Foods
21lb Ground Beef3.9929Meat and Seafood
3Dozen Apples2.4912Produce
4Chocolate Bar1.292Packaged Foods, Checkout
51 Gallon Milk3.2916Dairy
60.5lb Jumbo Shrimp5.2912Meat and Seafood
7Wheat Bread1.295Bakery
8Pepperoni Pizza2.995Frozen
912 Pack Cola5.296Packaged Foods
10Lime Juice0.9920Produce
1112 Pack Cherry Cola5.595Packaged Foods
121 Gallon Soy Milk3.3910Dairy
131 Gallon Vanilla Soy Milk3.499Dairy
141 Gallon Orange Juice3.294Juice

The curl command to create the mapping would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {
"products": {
"properties" : {
"name": { "type": "text"},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'

Now that we’ve set up our dataset and mapping, we can use aggregation to find all the unique departments in the store. Let’s look at the code below to see how it’s done. If the code seems a bit complex, don’t worry– the explanations that follow will clarify what’s going on:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d '
{
"aggs": {
"departments": {
"terms": {
"field": "department"
}
}
}
}
'

{
"took" : 80,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 14,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"departments" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Packaged Foods",
"doc_count" : 4
},
{
"key" : "Dairy",
"doc_count" : 3
},
{
"key" : "Meat and Seafood",
"doc_count" : 2
},
{
"key" : "Produce",
"doc_count" : 2
},
{
"key" : "Bakery",
"doc_count" : 1
},
{
"key" : "Checkout",
"doc_count" : 1
},
{
"key" : "Frozen",
"doc_count" : 1
},
{
"key" : "Juice",
"doc_count" : 1
}
]
}
}
}

If you look at the results, you can think of the products for each department as being sorted into “buckets”, with a bucket for each department. Each bucket contains a doc_count which tells us how many products were in that department.

How did we make this happen? We accomplished this by creating an aggregation using aggs. In our example, we named the aggregation departments and specified the field we wanted to aggregate by: department. Note that the cURL contains the parameter size=0. If we omitted this parameter, the query would return each product instead of the aggregate information we want.

Conclusion

There are many situations where you may want to find the unique values for a field in Elasticsearch– without the right tools, it can be difficult to wade through a large dataset. This tutorial explained how easy it is to use aggregation to accomplish this task. With this step-by-step guide, you can harness the power of aggregation to gain new insights into your data.

Just the Code

If you’re already familiar with the concept of aggregration, here’s all the code you’ll need to find the unique values of a field in Elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d '
{
"aggs": {
"departments": {
"terms": {
"field": "department"
}
}
}
}
'

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.