How to Get Unique Values for a Field in Elasticsearch

Written by Data Pilot

April 07, 2019

Elasticsearch

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

No matter what type of data you’re working with in Elasticsearch, there will probably be times when you want to find all the unique values for a given field. For example, let’s say you’re maintaining an index of all the artists your users listen to, along with the genre for each of these artists. You may want to compile a list of the genres each user listens to, but you don’t want your results to list every artist as well. Fortunately, finding these unique values in Elasticsearch is an easy task. In this tutorial, you’ll learn how to use aggregation to get the unique values for a field in Elasticsearch. If you’re already comfortable with the concept of aggregation and would prefer to skip the explanation, feel free to jump to Just the Code.

Aggregations by Example

Aggregation is a a powerful tool in Elasticsearch that allows you to calculate a field’s minimum, maximum, average, and much more; for now, we’re going to focus on its ability to determine unique values for a field. Let’s look at an example of how you can get the unique values for a field in Elasticsearch. For this example, we will use an index named store, which represents a small grocery store. Our store index will have a type named products which lists the store’s products. We’ll keep our dataset simple by only including a handful of products with just a small number of fields: id, price, quantity, and department. The following table shows the data we have in our index:

id	name	price	quantity	department
1	Multi-Grain Cereal	4.99	4	Packaged Foods
2	1lb Ground Beef	3.99	29	Meat and Seafood
3	Dozen Apples	2.49	12	Produce
4	Chocolate Bar	1.29	2	Packaged Foods, Checkout
5	1 Gallon Milk	3.29	16	Dairy
6	0.5lb Jumbo Shrimp	5.29	12	Meat and Seafood
7	Wheat Bread	1.29	5	Bakery
8	Pepperoni Pizza	2.99	5	Frozen
9	12 Pack Cola	5.29	6	Packaged Foods
10	Lime Juice	0.99	20	Produce
11	12 Pack Cherry Cola	5.59	5	Packaged Foods
12	1 Gallon Soy Milk	3.39	10	Dairy
13	1 Gallon Vanilla Soy Milk	3.49	9	Dairy
14	1 Gallon Orange Juice	3.29	4	Juice

The curl command to create the mapping would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {
"products": {
"properties" : {
"name": { "type": "text"},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'

Now that we’ve set up our dataset and mapping, we can use aggregation to find all the unique departments in the store. Let’s look at the code below to see how it’s done. If the code seems a bit complex, don’t worry– the explanations that follow will clarify what’s going on:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d '
{
"aggs": {
"departments": {
"terms": {
"field": "department"
}
}
}
}
'
{
"took" : 80,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 14,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"departments" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Packaged Foods",
"doc_count" : 4
},
{
"key" : "Dairy",
"doc_count" : 3
},
{
"key" : "Meat and Seafood",
"doc_count" : 2
},
{
"key" : "Produce",
"doc_count" : 2
},
{
"key" : "Bakery",
"doc_count" : 1
},
{
"key" : "Checkout",
"doc_count" : 1
},
{
"key" : "Frozen",
"doc_count" : 1
},
{
"key" : "Juice",
"doc_count" : 1
}
]
}
}
}

If you look at the results, you can think of the products for each department as being sorted into “buckets”, with a bucket for each department. Each bucket contains a doc_count which tells us how many products were in that department.

How did we make this happen? We accomplished this by creating an aggregation using aggs. In our example, we named the aggregation departments and specified the field we wanted to aggregate by: department. Note that the cURL contains the parameter size=0. If we omitted this parameter, the query would return each product instead of the aggregate information we want.

Conclusion

There are many situations where you may want to find the unique values for a field in Elasticsearch– without the right tools, it can be difficult to wade through a large dataset. This tutorial explained how easy it is to use aggregation to accomplish this task. With this step-by-step guide, you can harness the power of aggregation to gain new insights into your data.

Just the Code

If you’re already familiar with the concept of aggregration, here’s all the code you’ll need to find the unique values of a field in Elasticsearch:

1
2
3
4
5
6
7
8
9
10
11

curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d '
{
"aggs": {
"departments": {
"terms": {
"field": "department"
}
}
}
}
'

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

How to Get Unique Values for a Field in Elasticsearch

Introduction

Aggregations by Example

Conclusion

Just the Code

Pilot the ObjectRocket Platform Free!

Keep in the know!

Services

Platform

Company

Resources

Support