How to Create a Histogram Using Aggregation in Elasticsearch

Written by Data Pilot

April 07, 2019

Elasticsearch

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

One of many ways you can analyze and visualize your data in Elasticsearch is by creating a histogram. If you’re not familiar with the term, a histogram is a type of data visualization that looks a lot like a bar chart, with columns plotted across a graph. Each column represents a value or a range of values. The height of each column tells you the size of the group defined for that column. For example, let’s imagine we had a database that contained songs. A histogram could be created to tell you how many songs you had indexed for each genre: rock, blues, country, and more. Each column depicted on the histogram would represent one of those genres, and the height of each column would show the number of songs in that genre.

In this tutorial, you’ll learn how to create a histogram using aggregation in Elasticsearch. We’ll go through the process step by step, but if you’re already familiar with aggregation and would prefer to skip directly to the example code, feel free to jump to Just the Code.

Using the Histogram Aggregation

Let’s take a look at how aggregation can be used to render a histogram in Elasticsearch. In our example, we’ll use an index named store, which represents a small grocery store. This index will contain a type called products which lists all of the products in the store. We’ll keep this example simple with just a handful of items in our dataset and just a few fields for each item: id, price, quantity, and department. The JSON needed to create this dataset is shown below:

id	name	price	quantity	department
1	Multi-Grain Cereal	4.99	4	Packaged Foods
2	1lb Ground Beef	3.99	29	Meat and Seafood
3	Dozen Apples	2.49	12	Produce
4	Chocolate Bar	1.29	2	Packaged Foods	Checkout
5	1 Gallon Milk	3.29	16	Dairy
6	0.5lb Jumbo Shrimp	5.29	12	Meat and Seafood
7	Wheat Bread	1.29	5	Bakery
8	Pepperoni Pizza	2.99	5	Frozen
9	12 Pack Cola	5.29	6	Packaged Foods
10	Lime Juice	0.99	20	Produce
11	12 Pack Cherry Cola	5.59	5	Packaged Foods
12	1 Gallon Soy Milk	3.39	10	Dairy
13	1 Gallon Vanilla Soy Milk	3.49	9	Dairy
14	1 Gallon Orange Juice	3.29	4	Juice

The following code contains the mapping:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {`
"products": {
"properties" : {`
"name": { "type": "text"},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'

Now that we’ve set up our sample dataset, let’s imagine that we wanted to use a histogram to depict all of our products based on their price. We’d like to see how many products are priced between $0.01 and $1.00, how many are priced between $1.01 and $2.00, and so forth. The following code will accomplish the task for us:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d '
{
"aggs": {
"histogram_by_dollar": {
"histogram": {
"field": "price",
"interval": 1.00
}
}
}
}
'
{
"took" : 70,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 14,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"histogram_by_dollar" : {
"buckets" : [
{
"key" : 0.0,
"doc_count" : 1
},
{
"key" : 1.0,
"doc_count" : 2
},
{
"key" : 2.0,
"doc_count" : 2
},
{
"key" : 3.0,
"doc_count" : 5
},
{
"key" : 4.0,
"doc_count" : 1
},
{
"key" : 5.0,
"doc_count" : 3
}
]
}
}
}

Let’s take a closer look at what we just did here. We created an aggregator called histogram_by_dollar inside aggs, and we defined a few things: We defined histogram as the type of aggregation we’re doing The histogram will be based on the price field. This means that the data will be grouped into “buckets” based on price. * The interval is defined as 1.0. This means that the range for each “bucket” will be $1.00, which allows us to see which products are priced from $0 to $1.00, $1.01 to $2.00, and so on.

The output for this aggregation looks exactly as we expected: The aggregator successfully created “buckets” based on price, and the doc_count tells us how many products were in that bucket. The key value serves to identify each bucket.

Conclusion

If you want to visualize your data and show how it’s distributed among groups or ranges, a histogram can be an excellent choice. In this tutorial, we showed you how to use aggregation to calculate histogram data in Elasticsearch. With these step-by-step instructions, you can harness the power of aggregation and get a bird’s-eye view of large datasets.

Just the Code

If you’re already familiar with the concept of aggregation, here’s all the code you need to calculate histogram data in Elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
12

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

How to Create a Histogram Using Aggregation in Elasticsearch

Introduction

Using the Histogram Aggregation

Conclusion

Just the Code

Pilot the ObjectRocket Platform Free!

Keep in the know!

Services

Platform

Company

Resources

Support