How to Compute a Weighted Average with Aggregations in Elasticsearch using NodeJS
Introduction
When you’re determining the average for values in a dataset, sometimes it doesn’t make sense to treat all the values equally. For example, if you’re calculating student grades, doing inventory cost accounting, or computing average bond yields, it’s helpful to assign a relative importance to each value in your dataset. That’s where a weighted average comes into play. Elasticsearch makes this task simple through the use of aggregations. In this tutorial, we’ll provide step-by-step instructions for computing a weighted average with aggregations in Elasticsearch using NodeJS. If you’d prefer to skip the explanations and dive into the sample code, feel free to jump to Just the Code.
Prerequisites
Before we show you how to compute a weighted average with Elasticsearch in Javascript, it’s important to make sure a few prerequisites are in place. There are only a few of system requirements for this task:
NodeJS needs to be installed
The elasticsearch npm module installed.
A simple npm install elasticsearch
should work in most cases.
Elasticsearch also needs to be installed and running.
* In our example, we have Elasticsearch installed locally using the default port of 9200. If your Elasticsearch installation is running on a different server, you’ll need to modify your javascript syntax accordingly.
What is a weighted average?
Before we go through the formulaic math of a weighted average let’s go over a practical example to better understand it’s use.
Let’s say we know that the average height of men worldwide is 180cm and the average height of females is 170cm. You might conclude that the average height of humans worldwide is 175cm. But that would not take into account that there is an unequal number of men and women. How do we take this into account? Weighted averages. Since there are more women than men, the 170cm should be weighted more in the average. If we knew the percentage of men and women, we could determine the average human size.
The math: For the standard average calculation every value has the same weight and contributes equally to the average. For a weighted average though each value can have a different weight. Here’s how you calculate the weighted average of a set of values and weights for each value.
Weighted Average = ∑(value * weight) / ∑(weight)
Using the weighted_grade aggregation
id | name | price | quantity | department |
---|---|---|---|---|
1 | Multi-Grain Cereal | 4.99 | 4 | Packaged Foods |
2 | 1lb Ground Beef | 3.99 | 29 | Meat and Seafood |
3 | Dozen Apples | 2.49 | 12 | Produce |
4 | Chocolate Bar | 1.29 | 2 | Packaged Foods, Checkout |
5 | 1 Gallon Milk | 3.29 | 16 | Dairy |
6 | 0.5lb Jumbo Shrimp | 5.29 | 12 | Meat and Seafood |
7 | Wheat Bread | 1.29 | 5 | Bakery |
8 | Pepperoni Pizza | 2.99 | 5 | Frozen |
9 | 12 Pack Cola | 5.29 | 6 | Packaged Foods |
10 | Lime Juice | 0.99 | 20 | Produce |
11 | 12 Pack Cherry Cola | 5.59 | 5 | Packaged Foods |
12 | 1 Gallon Soy Milk | 3.39 | 10 | Dairy |
13 | 1 Gallon Vanilla Soy Milk | 3.49 | 9 | Dairy |
14 | 1 Gallon Orange Juice | 3.29 | 4 | Juice |
And here is the json we used to make the mapping:
1 2 3 4 5 6 7 8 9 10 11 12 | { "mappings": { "products": { "properties" : { "name": { "type": "text"}, "price": { "type": "double"}, "quantity": { "type": "integer"}, "department": { "type": "keyword"} } } } } |
For this example we will compute the average price of a product in our small grocery store. But products that we have a large quantity of should weigh more than those that we have a very small quantity of. This is a perfect example to use weighted average for. The quantity of each product perfectly serves as the weight. So since there are two “Chocolate Bars” the price of $1.29 will weigh twice as much. And the price “Lime Juice” $0.99 with a quantity of 10 will weigh ten times as much.
The calculation would look something like this: (4.99 x 4) + (3.99 x 29) + (2.49 x 12) + … / ( 4 + 29 + 12 + …)
Now let’s stop the math talk and get to the code:
File index.js
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | var elasticsearch = require("elasticsearch"); var client = new elasticsearch.Client({ hosts: ["http://localhost:9200"] }); /* Aggregate Weighted Average */ client.search({ index: 'store', type: 'products', body: { aggs: { weighted_avg_price: { weighted_avg: { value: { field: "price" }, weight: { field: "quantity" } } } } } }).then(function(resp) { console.log("Successful query! Here is the response:", resp); }, function(err) { console.trace(err.message); }); |
There’s a few steps to dissect so let’s dive in step-by-step:
First we required the elasticsearch library because it gives us the library of functions that make it easy to access Elasticsearch.
Next we created a variable var client
which creates and stores our connection to Elasticsearch. From this point on we’ll use this client to do all our interactions with Elasticsearch.
We then use the search function on client
to create a query with an aggregator.
Of course we specify the index and type to perform the search on. The important part comes in the body where we define the aggregator by using the aggs
keyword. We gave our aggregator a name weighted_avg_price
that can be anything but choose something that makes sense. We specify the type of aggregator as weighted_avg
.
* Lastly we specify what field the aggregator should evaluate, price
and what to use as the weight, quantity
.
Let’s look at our result to see if it adds up:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | $ node index.js Successful query! { took: 1, timed_out: false, _shards: { total: 5, successful: 5, skipped: 0, failed: 0 }, hits: { total: 13, max_score: 1, hits: [ [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object] ] }, aggregations: { weighted_avg_price: { value: 3.7631092436974796 } } } |
The weighted_avg_price
returned was $3.76. If you’re not familiar with weighted averages it would be a good exercise to verify the math by hand.
Other weighted_avg Options
There are of course more options when computing a weighted average. You can specify a value to use if the field is missing You can specify a weight to use if the field is missing * You can use scripts to determine the value or weight if the fields aren’t exactly the value or weight you need.
Consult the Elasticsearch documentation for all the options on aggregations.
Conclusion
In this tutorial we demonstrated how to use Elasticsearch aggregations to calculate a weighted average. Remember to explore the myriad of other aggregator options and combinations available. Consult the documentation for more information on aggregators and specific syntax. Their documentation is full of great examples.
We hope you found tutorial helpful and you can apply it to your specific application. If you have questions or this didn’t work for you please reach out to us so we can help. Thank you.
Just the Code
If you’re already comfortable with NodeJS and aggregations here’s all the code we used to demonstrate how to find a weighted average with Elasticsearch and NodeJS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | var elasticsearch = require("elasticsearch"); var client = new elasticsearch.Client({ hosts: ["http://localhost:9200"] }); /* Aggregate Weighted Average */ client.search({ index: 'store', type: 'products', body: { aggs: { weighted_avg_price: { weighted_avg: { value: { field: "price" }, weight: { field: "quantity" } } } } } }).then(function(resp) { console.log("Successful query! Here is the response:", resp); }, function(err) { console.trace(err.message); }); |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started