How to Optimize a Query in MongoDB
Introduction
In this tutorial we’ll show you how to analyze the performance of a MongoDB query and then how to optimize it using indexing. We’ll be using a simple example with a small dataset for simplicity’s sake but the concept is the same with large datasets. In this day and age users will not put up with a laggy website and you need to do everything you can to make your website performant. One of those things is taking a look at all your database queries, figuring out which ones occur frequently, and how you can improve those queries to be most efficient. Please continue reading and we’ll show you an example of a query and how we’d go about optimizing it.
Prerequisites
You should have MongoDB installed and running.
If you have it installed, you can run a single instance with the command
mongod
.Although not required we recommend you have some experience using the command line.
If you’d like to follow along with our example, you’ll need to go through “How to Import a CSV into MongoDB” because we’ll be using the
products
collection created in that tutorial.You should have the
products
collection in thetest
database. Theproducts
collection should have 10 documents within it. Below was our collection at the end of that tutorial:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | >db.products.find() { "_id" : ObjectId("5cf80de8218efd0aa5e5ab60"), "ProductName" : "Paper Towels", "Price" : 1.25, "Quantity" : 8 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab61"), "ProductName" : "1 Gallon Soy Milk", "Price" : 2.5, "Quantity" : 12 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab62"), "ProductName" : "Salt and Vinegar Chips", "Price" : 2.25, "Quantity" : 22 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab63"), "ProductName" : "Red Wine", "Price" : 14, "Quantity" : 3 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab64"), "ProductName" : "Barbecue Chips", "Price" : 2.25, "Quantity" : 1 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab65"), "ProductName" : "Six Pack Soda", "Price" : 3, "Quantity" : 3 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab66"), "ProductName" : "1 Gallon Almond Milk", "Price" : 2.75, "Quantity" : 22 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab67"), "ProductName" : "Chocolate Bar", "Price" : 1, "Quantity" : 4 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab68"), "ProductName" : "Breath Mints", "Price" : 0.5, "Quantity" : 11 } { "_id" : ObjectId("5cf80de8218efd0aa5e5ab69"), "ProductName" : "Gatorade", "Price" : 2, "Quantity" : 21 } |
Analyzing the Query
We’ll be using the MongoDB shell to execute the query we’ll eventually optimize so let’s open the MongoDB shell:
1 | mongo |
Now let’s start by first executing the query normally:
1 | db.products.find({Quantity: 3}) |
We get back two results:
1 2 3 4 5 6 7 8 9 10 11 12 13 | >db.products.find({Quantity: 3}) { "_id" : ObjectId("5cf825cf218efd0aa5e5abd5"), "ProductName" : "Six Pack Soda", "Price" : 3, "Quantity" : 3 } { "_id" : ObjectId("5cf825cf218efd0aa5e5abd6"), "ProductName" : "Red Wine", "Price" : 14, "Quantity" : 3 } |
Now let’s run the same command but using the explain
method using the verbose flag like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | >db.products.find({Quantity: 3}).explain({verbose: true}) { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.products", "indexFilterSet" : false, "parsedQuery" : { "Quantity" : { "$eq" : 3 } }, "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "Price" : { "$eq" : 3 } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 1, "executionTimeMillis" : 0, "totalKeysExamined" : 0, "totalDocsExamined" : 10, "executionStages" : { "stage" : "COLLSCAN", "filter" : { "Quantity" : { "$eq" : 3 } }, "nReturned" : 1, "executionTimeMillisEstimate" : 0, "works" : 12, "advanced" : 1, "needTime" : 10, "needYield" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "direction" : "forward", "docsExamined" : 10 }, "allPlansExecution" : [ ] }, "serverInfo" : { "host" : "Matts-MBP", "port" : 27017, "version" : "4.0.3", "gitVersion" : "7ea530946fa7880364d88c8d8b6026bbc9ffa48c" }, "ok" : 1 } > |
This explain command returns to us information about how efficient the query was. Now this is a lot of information so let’s pick one useful piece of data:
1 | "docsExamined" : 10 |
This means that the query had to check every document to see if the Quantity was 3. Well that may be ok for this tiny dataset but what happens when this dataset becomes 12 Terabytes, do you still want to scan every document? Let’s see a couple options for improving the performance, namely let’s see if we can get the docsExamined
number down.
Limit
If our query doesn’t need ALL the results, for example if you will only show the top ten results, then you can use limit function like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | >db.products.find({Quantity: 3}).limit(1).explain({verbose:true}) { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.products", "indexFilterSet" : false, "parsedQuery" : { "Quantity" : { "$eq" : 3 } }, "winningPlan" : { "stage" : "LIMIT", "limitAmount" : 1, "inputStage" : { "stage" : "COLLSCAN", "filter" : { "Quantity" : { "$eq" : 3 } }, "direction" : "forward" } }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 1, "executionTimeMillis" : 0, "totalKeysExamined" : 0, "totalDocsExamined" : 1, "executionStages" : { "stage" : "LIMIT", "nReturned" : 1, "executionTimeMillisEstimate" : 0, "works" : 3, "advanced" : 1, "needTime" : 1, "needYield" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "limitAmount" : 1, "inputStage" : { "stage" : "COLLSCAN", "filter" : { "Quantity" : { "$eq" : 3 } }, "nReturned" : 1, "executionTimeMillisEstimate" : 0, "works" : 2, "advanced" : 1, "needTime" : 1, "needYield" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 0, "invalidates" : 0, "direction" : "forward", "docsExamined" : 1 } }, "allPlansExecution" : [ ] }, "serverInfo" : { "host" : "Matts-MBP", "port" : 27017, "version" : "4.0.3", "gitVersion" : "7ea530946fa7880364d88c8d8b6026bbc9ffa48c" }, "ok" : 1 } |
Now totalDocsExamined
was 1, which means it found a result on the first document and because it was limited to finding one document it returned right away instead of searching the rest of the documents.
Use an Index
Another great option is to use an index on the Quantity field.
1 | > db.products.createIndex({Quantity:1}) |
You should see a response similar to below:
1 2 3 4 5 6 | { "createdCollectionAutomatically" : true, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } |
This creates on index on the Quantity
field and the 1 parameter will tell it to sort that index in ascending order.
Now let’s see if our performance improved:
1 | db.products.find({Quantity: 3}).explain({verbose: true}) |
The response will show us that "winningPlan"
or the method by which it found the documents was “IXSCAN” which indicates that it used our index instead of checking every document.
On larger datasets you should look at the executionTimeMillis
and verify that the query becomes more efficient.
Conclusion
In this tutorial we went through a couple different ways to optimize a query in MongoDB including limiting the query results and creating an index. We introduced the explain()
function which will give you vital information on how your query is performing. If you need help improving the efficiency of your database or want us to recommend a more efficient solution please don’t hesitate to reach out to us at Object Rocket.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started