How to Use Percentile Aggregation in Elasticsearch with NodeJS
Introduction
An aggregation computation that comes up frequently when trying to figure out the distribution of your data is the percentile aggregation. Percentile aggregagtions are simple in Elasticsearch and this step-by-step example will show you an example of one that you can use as a basis for your own aggregation. In this step-by-step example we will interact with Elasticsearch using Javascript running on NodeJS to perform the percentile aggregation. If you’d just rather see the example code, click here to jump to Just the Code.
Note: The code will vary depending on all your system parameters but we hope to give you an idea of how this is done.
What is a Percentile Aggregation
Percentile aggregations are helpful to find outliers in data and figure out the distrubution of data. If you graduated in the top 10% of your class, that is a good example of percentile aggregation. Another way to look at it is to say that the 99th percentile is the value which is greater than 99% of all other values that appear in the dataset.
Prerequisites
Before we show you how to compute the average value of a field with Elasticsearch in Javascript, it’s important to make sure a few prerequisites are in place. There are only a few of system requirements for this task:
NodeJS needs to be installed
The elasticsearch npm module installed.
A simple npm install elasticsearch
should work in most cases.
Elasticsearch also needs to be installed and running.
* In our example, we have Elasticsearch installed locally using the default port of 9200. If your Elasticsearch installation is running on a different server, you’ll need to modify your javascript syntax accordingly.
Introduce our Example Data
Let’s look at an example that uses an index called store
, which represents a small grocery store. This store
index contains a type called products
which lists the store’s products. To keep things simple, our example dataset will only contain a handful of products with just the following fields: id, price, quantity, and department. The code below shows the JSON used to create the dataset:
id | name | price | quantity | department |
---|---|---|---|---|
1 | Multi-Grain Cereal | 4.99 | 4 | Packaged Foods |
2 | 1lb Ground Beef | 3.99 | 29 | Meat and Seafood |
3 | Dozen Apples | 2.49 | 12 | Produce |
4 | Chocolate Bar | 1.29 | 2 | Packaged Foods, Checkout |
5 | 1 Gallon Milk | 3.29 | 16 | Dairy |
6 | 0.5lb Jumbo Shrimp | 5.29 | 12 | Meat and Seafood |
7 | Wheat Bread | 1.29 | 5 | Bakery |
8 | Pepperoni Pizza | 2.99 | 5 | Frozen |
9 | 12 Pack Cola | 5.29 | 6 | Packaged Foods |
10 | Lime Juice | 0.99 | 20 | Produce |
11 | 12 Pack Cherry Cola | 5.59 | 5 | Packaged Foods |
12 | 1 Gallon Soy Milk | 3.39 | 10 | Dairy |
13 | 1 Gallon Vanilla Soy Milk | 3.49 | 9 | Dairy |
14 | 1 Gallon Orange Juice | 3.29 | 4 | Juice |
Here is the json we used to define the mapping if our index:
1 2 3 4 5 6 7 8 9 10 11 12 | { "mappings": { "products": { "properties" : { "name": { "type": "text"}, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } |
Use the Percentile Aggregation
Now let’s say we want to get a sense of our pricing at our store. We could use a percentile aggregation to get a sense of the distribution of our prices. The code to sum the quantities of each product is below. Let’s take a look and then we’ll dissect the code afterwards.
File: percentile.js
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | var elasticsearch = require("elasticsearch"); var client = new elasticsearch.Client({ hosts: ["http://localhost:9200"] }); /* Calculate the percentile aggregation */ client.search({ size: 0, index: 'store', type: 'products', body: { "aggs" : { "price_by_percentile" : { "percentiles" : { "field" : "price" } } } } }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message); }); |
You can run this application using NodeJS with this command:
1 | $ node percentile.js |
What we get back is success message and we can verify that our aggregator gave us the value we expected:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | Successful query! { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 14, "max_score": 0, "hits": [] }, "aggregations": { "price_by_percentile": { "values": { "1.0": 0.99, "5.0": 1.05, "25.0": 2.49, "50.0": 3.34, "75.0": 4.99, "95.0": 5.529999999999999, "99.0": 5.59 } } } } |
You can see what’s returned is an easy to read breakdown of our prices with our lowest price $0.99 at the low end (1st percentile) and $5.59 at the high end (99th percentile).
Now let’s go over the important parts of the code:
We required the elasticsearch library.
We created a client that established the connection to Elasticsearch.
We then used the search function on the client to make our request with the following specifications.
We created an aggregator with "aggs"
keyword.
Then we gave it a name "price_by_percentile"
. This can be whatever name suits your application.
Next we used "percentiles"
to set the type of aggregator so Elasticsearch knows what type of aggregation to perform.
* Lastly we specified which field we wanted to perform the percentile aggregation on ‘”price”`.
Note: We used
size: 0
because otherwise this query would return every product. With this parameter set, all we see is the aggregate information.
Conclusion
In this tutorial we demonstrated how to use Elasticsearch to use the percentile aggregation with Javascript running on NodeJS. There are many other operations you can perform with aggregations and you can consult the Elasticsearch documentation to learn more about it. The documentation is also useful if you need help with speciific syntax.
We hope you found tutorial helpful and you can apply it to your specific application. If you have questions or this didn’t work for you please reach out to us so we can help. Thank you.
Just the Code
If you’re already comfortable with aggregations, Javascript, and NodeJS, here’s all the code we used to demonstrate how to do a percentile aggregation.
File: percentile.js
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | var elasticsearch = require("elasticsearch"); var client = new elasticsearch.Client({ hosts: ["http://localhost:9200"] }); /* Calculate the percentile aggregation */ client.search({ size: 0, index: 'store', type: 'products', body: { "aggs" : { "price_by_percentile" : { "percentiles" : { "field" : "price" } } } } }).then(function(resp) { console.log("Successful query!"); console.log(JSON.stringify(resp, null, 4)); }, function(err) { console.trace(err.message); }); |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started