How to Create a Histogram Using Aggregation in Elasticsearch

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

One of many ways you can analyze and visualize your data in Elasticsearch is by creating a histogram. If you’re not familiar with the term, a histogram is a type of data visualization that looks a lot like a bar chart, with columns plotted across a graph. Each column represents a value or a range of values. The height of each column tells you the size of the group defined for that column. For example, let’s imagine we had a database that contained songs. A histogram could be created to tell you how many songs you had indexed for each genre: rock, blues, country, and more. Each column depicted on the histogram would represent one of those genres, and the height of each column would show the number of songs in that genre.

In this tutorial, you’ll learn how to create a histogram using aggregation in Elasticsearch. We’ll go through the process step by step, but if you’re already familiar with aggregation and would prefer to skip directly to the example code, feel free to jump to Just the Code.

Using the Histogram Aggregation

Let’s take a look at how aggregation can be used to render a histogram in Elasticsearch. In our example, we’ll use an index named store, which represents a small grocery store. This index will contain a type called products which lists all of the products in the store. We’ll keep this example simple with just a handful of items in our dataset and just a few fields for each item: id, price, quantity, and department. The JSON needed to create this dataset is shown below:

idnamepricequantitydepartment
1Multi-Grain Cereal4.994Packaged Foods
21lb Ground Beef3.9929Meat and Seafood
3Dozen Apples2.4912Produce
4Chocolate Bar1.292Packaged FoodsCheckout
51 Gallon Milk3.2916Dairy
60.5lb Jumbo Shrimp5.2912Meat and Seafood
7Wheat Bread1.295Bakery
8Pepperoni Pizza2.995Frozen
912 Pack Cola5.296Packaged Foods
10Lime Juice0.9920Produce
1112 Pack Cherry Cola5.595Packaged Foods
121 Gallon Soy Milk3.3910Dairy
131 Gallon Vanilla Soy Milk3.499Dairy
141 Gallon Orange Juice3.294Juice

The following code contains the mapping:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
curl -H "Content-Type: application/json" -XPUT 127.0.0.1:9200/store -d '
{
"mappings": {`
"products": {
"properties" : {`
"name": { "type": "text"},
"price": { "type": "double"},
"quantity": { "type": "integer"},
"department": { "type": "keyword"}
}
}
}
}
'

Now that we’ve set up our sample dataset, let’s imagine that we wanted to use a histogram to depict all of our products based on their price. We’d like to see how many products are priced between $0.01 and $1.00, how many are priced between $1.01 and $2.00, and so forth. The following code will accomplish the task for us:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d '
{
"aggs": {
"histogram_by_dollar": {
"histogram": {
"field": "price",
"interval": 1.00
}
}
}
}
'

{
"took" : 70,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 14,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"histogram_by_dollar" : {
"buckets" : [
{
"key" : 0.0,
"doc_count" : 1
},
{
"key" : 1.0,
"doc_count" : 2
},
{
"key" : 2.0,
"doc_count" : 2
},
{
"key" : 3.0,
"doc_count" : 5
},
{
"key" : 4.0,
"doc_count" : 1
},
{
"key" : 5.0,
"doc_count" : 3
}
]
}
}
}

Let’s take a closer look at what we just did here. We created an aggregator called histogram_by_dollar inside aggs, and we defined a few things: We defined histogram as the type of aggregation we’re doing The histogram will be based on the price field. This means that the data will be grouped into “buckets” based on price. * The interval is defined as 1.0. This means that the range for each “bucket” will be $1.00, which allows us to see which products are priced from $0 to $1.00, $1.01 to $2.00, and so on.

The output for this aggregation looks exactly as we expected: The aggregator successfully created “buckets” based on price, and the doc_count tells us how many products were in that bucket. The key value serves to identify each bucket.

Conclusion

If you want to visualize your data and show how it’s distributed among groups or ranges, a histogram can be an excellent choice. In this tutorial, we showed you how to use aggregation to calculate histogram data in Elasticsearch. With these step-by-step instructions, you can harness the power of aggregation and get a bird’s-eye view of large datasets.

Just the Code

If you’re already familiar with the concept of aggregation, here’s all the code you need to calculate histogram data in Elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
12
curl -H "Content-Type: application/json" -XGET "127.0.0.1:9200/store/products/_search?size=0&pretty" -d '
{
"aggs": {
"histogram_by_dollar": {
"histogram": {
"field": "price",
"interval": 1.00
}
}
}
}
'

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.