How to Create a Histogram with Elasticsearch Aggregations using NodeJS
Introduction
Elasticsearch aggregations allow you to compute averages, sums, maximums, and minimums. It also let’s you create histograms from your data. Let’s start with what a histogram is in case you’ve forgotten. A histogram is a bar graph that tells you how many there are of each given type. For example, if you had a database of songs, a histogram might tell you how many country, rock, and blues songs there are. The types would be on the x-axis and the number of occurrences would be on the y-axis. If this is the type of calculation you need we’ll show you how to do it step-by-step in Javascript. But if you’d just rather see the example code, click here to jump to Just the Code.
Prerequisites
Before we show you how to create the histogram data, it’s important to make sure a few prerequisites are in place. There are only a few of system requirements for this task:
NodeJS needs to be installed
Elasticsearch needs to be installed and running
In our example, we have Elasticsearch installed locally using the default port of 9200. If your Elasticsearch installation is running on a different server, you’ll need to modify your javascript accordingly.
The elasticsearch npm module installed.
* A simple npm install elasticsearch
should work in most cases.
Use the histogram Aggregation
We love to show by example. In this example we use an index representing a small grocery store called store
. The store
index contains the type products
which lists all the stores products. To keep it simple our dataset only has a small number of products with just a few fields including the id, price, quantity, and department. Here is the json we used to create our dataset:
id | name | price | quantity | department | |
---|---|---|---|---|---|
1 | Multi-Grain Cereal | 4.99 | 4 | Packaged Foods | |
2 | 1lb Ground Beef | 3.99 | 29 | Meat and Seafood | |
3 | Dozen Apples | 2.49 | 12 | Produce | |
4 | Chocolate Bar | 1.29 | 2 | Packaged Foods | Checkout |
5 | 1 Gallon Milk | 3.29 | 16 | Dairy | |
6 | 0.5lb Jumbo Shrimp | 5.29 | 12 | Meat and Seafood | |
7 | Wheat Bread | 1.29 | 5 | Bakery | |
8 | Pepperoni Pizza | 2.99 | 5 | Frozen | |
9 | 12 Pack Cola | 5.29 | 6 | Packaged Foods | |
10 | Lime Juice | 0.99 | 20 | Produce | |
11 | 12 Pack Cherry Cola | 5.59 | 5 | Packaged Foods | |
12 | 1 Gallon Soy Milk | 3.39 | 10 | Dairy | |
13 | 1 Gallon Vanilla Soy Milk | 3.49 | 9 | Dairy | |
14 | 1 Gallon Orange Juice | 3.29 | 4 | Juice |
And here is the json we used to make the mapping:
1 2 3 4 5 6 7 8 9 10 11 12 | { "mappings": {` "products": { "properties" : {` "name": { "type": "text"}, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } |
Now let’s say we wanted a histogram of all our products based on price. How many products are in the price range from $0.00 to $1.00? What about $1.01 to $2.00? This is what a histogram would tell us. Now let’s solve that problem to demonstrate:
File index.js
:
`
js
var elasticsearch = require(“elasticsearch”);
var client = new elasticsearch.Client({ hosts: [“http://localhost:9200”] });
/ Histogram Aggregation / client.search({ size : 0, index: ‘store’, type: ‘products’, body: {
1
2
3
4
5
6
7
8
9 aggs: {
histogram_by_dollar: {
histogram:
{
field: "price",
interval: 1.00
}
}
}
}
}).then(function(resp) {
console.log(“Successful query!”);
console.log(JSON.stringify(resp, null, 4));
}, function(err) {
console.trace(err.message);
});
`
There’s a few steps to unpack so let’s dive in step-by-step:
First we required the elasticsearch library because it gives us the library of functions that make it easy to access Elasticsearch.
Next we created a variable var client
which creates and stores our connection to Elasticsearch. From this point on we’ll use this client to do all our interactions with Elasticsearch.
We then use the search function on client
to create a query with an aggregator.
We specify the index and type to perform the search on. The important part comes in the body where we define the aggregator by using the aggs
keyword. We gave our aggregator a name histogram_by_dollar
that can be anything but choose something that makes sense. We specify the type of aggregator as histogram
.
Lastly we specify what field the aggregator should evaluate, price
.
The range of values for each bucket 1.0
. This means it will tell us how many products are in the price range from $0-$1, $1.01-$2, and so on.
Note: We used JSON.stringify to pretty print our results. We also gave a
size : 0
because we don’t want any specific products back but only the aggregation data.
We run our NodeJS app and get this response:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | $ node index.js Successful query! { "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 13, "max_score": 0, "hits": [] }, "aggregations": { "histogram_by_dollar": { "buckets": [ { "key": 1, "doc_count": 2 }, { "key": 2, "doc_count": 2 }, { "key": 3, "doc_count": 5 }, { "key": 4, "doc_count": 0 }, { "key": 5, "doc_count": 4 } ] } } } |
What we got back was exactly as expected. You can see our aggregator created buckets and the doc_count
tells us how many products were in that bucket. The key
value tells us which bucket we are looking at.
Conclusion
In this tutorial we demonstrated how to use Elasticsearch aggregations to calculate histogram data based on a specific field and interval. This is a common use case. It’s very good for getting a bird’s eye view of your data. There are many other things you can do with aggregation and you can consult the Elasticsearch documentation to learn more about it. The documentation is also useful if you need help with syntax. We hope you found this tutorial on histogram aggregations helpful and can apply it to your specific application. If you have questions or this didn’t work for you please reach out to us so we can help. Thank you.
Just the Code
If you’re already comfortable with NodeJS and aggregations here’s all the code we used to demonstrate how to create a histogram in Elasticsearch using NodeJS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | var elasticsearch = require("elasticsearch"); var client = new elasticsearch.Client({ hosts: ["http://localhost:9200"] }); /* Histogram Aggregation */ client.search({ size : 0, index: 'store', type: 'products', body: { aggs: { histogram_by_dollar: { histogram: { field: "price", interval: 1.00 } } } } }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message); }); |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started