How to Optimize a Query in MongoDB

Introduction

In this tutorial we’ll show you how to analyze the performance of a MongoDB query and then how to optimize it using indexing. We’ll be using a simple example with a small dataset for simplicity’s sake but the concept is the same with large datasets. In this day and age users will not put up with a laggy website and you need to do everything you can to make your website performant. One of those things is taking a look at all your database queries, figuring out which ones occur frequently, and how you can improve those queries to be most efficient. Please continue reading and we’ll show you an example of a query and how we’d go about optimizing it.

Prerequisites

  • You should have MongoDB installed and running.

  • If you have it installed, you can run a single instance with the command mongod.

  • Although not required we recommend you have some experience using the command line.

  • If you’d like to follow along with our example, you’ll need to go through “How to Import a CSV into MongoDB” because we’ll be using the products collection created in that tutorial.

  • You should have the products collection in the test database. The products collection should have 10 documents within it. Below was our collection at the end of that tutorial:

>db.products.find()
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab60"),
        "ProductName" : "Paper Towels",
        "Price" : 1.25,
        "Quantity" : 8
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab61"),
        "ProductName" : "1 Gallon Soy Milk",
        "Price" : 2.5,
        "Quantity" : 12
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab62"),
        "ProductName" : "Salt and Vinegar Chips",
        "Price" : 2.25,
        "Quantity" : 22
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab63"),
        "ProductName" : "Red Wine",
        "Price" : 14,
        "Quantity" : 3
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab64"),
        "ProductName" : "Barbecue Chips",
        "Price" : 2.25,
        "Quantity" : 1
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab65"),
        "ProductName" : "Six Pack Soda",
        "Price" : 3,
        "Quantity" : 3
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab66"),
        "ProductName" : "1 Gallon Almond Milk",
        "Price" : 2.75,
        "Quantity" : 22
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab67"),
        "ProductName" : "Chocolate Bar",
        "Price" : 1,
        "Quantity" : 4
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab68"),
        "ProductName" : "Breath Mints",
        "Price" : 0.5,
        "Quantity" : 11
}
{
        "_id" : ObjectId("5cf80de8218efd0aa5e5ab69"),
        "ProductName" : "Gatorade",
        "Price" : 2,
        "Quantity" : 21
}

Analyzing the Query

We’ll be using the MongoDB shell to execute the query we’ll eventually optimize so let’s open the MongoDB shell:

mongo

Now let’s start by first executing the query normally:

db.products.find({Quantity: 3})

We get back two results:

>db.products.find({Quantity: 3})
{
        "_id" : ObjectId("5cf825cf218efd0aa5e5abd5"),
        "ProductName" : "Six Pack Soda",
        "Price" : 3,
        "Quantity" : 3
}
{
        "_id" : ObjectId("5cf825cf218efd0aa5e5abd6"),
        "ProductName" : "Red Wine",
        "Price" : 14,
        "Quantity" : 3
}

Now let’s run the same command but using the explain method using the verbose flag like so:

>db.products.find({Quantity: 3}).explain({verbose: true})
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "test.products",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "Quantity" : {
                                "$eq" : 3
                        }
                },
                "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "Price" : {
                                        "$eq" : 3
                                }
                        },
                        "direction" : "forward"
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1,
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 10,
                "executionStages" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "Quantity" : {
                                        "$eq" : 3
                                }
                        },
                        "nReturned" : 1,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 12,
                        "advanced" : 1,
                        "needTime" : 10,
                        "needYield" : 0,
                        "saveState" : 0,
                        "restoreState" : 0,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "direction" : "forward",
                        "docsExamined" : 10
                },
                "allPlansExecution" : [ ]
        },
        "serverInfo" : {
                "host" : "Matts-MBP",
                "port" : 27017,
                "version" : "4.0.3",
                "gitVersion" : "7ea530946fa7880364d88c8d8b6026bbc9ffa48c"
        },
        "ok" : 1
}
>

This explain command returns to us information about how efficient the query was. Now this is a lot of information so let’s pick one useful piece of data:

"docsExamined" : 10

This means that the query had to check every document to see if the Quantity was 3. Well that may be ok for this tiny dataset but what happens when this dataset becomes 12 Terabytes, do you still want to scan every document? Let’s see a couple options for improving the performance, namely let’s see if we can get the docsExamined number down.

Limit

If our query doesn’t need ALL the results, for example if you will only show the top ten results, then you can use limit function like so:

>db.products.find({Quantity: 3}).limit(1).explain({verbose:true})
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "test.products",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "Quantity" : {
                                "$eq" : 3
                        }
                },
                "winningPlan" : {
                        "stage" : "LIMIT",
                        "limitAmount" : 1,
                        "inputStage" : {
                                "stage" : "COLLSCAN",
                                "filter" : {
                                        "Quantity" : {
                                                "$eq" : 3
                                        }
                                },
                                "direction" : "forward"
                        }
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1,
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 1,
                "executionStages" : {
                        "stage" : "LIMIT",
                        "nReturned" : 1,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 3,
                        "advanced" : 1,
                        "needTime" : 1,
                        "needYield" : 0,
                        "saveState" : 0,
                        "restoreState" : 0,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "limitAmount" : 1,
                        "inputStage" : {
                                "stage" : "COLLSCAN",
                                "filter" : {
                                        "Quantity" : {
                                                "$eq" : 3
                                        }
                                },
                                "nReturned" : 1,
                                "executionTimeMillisEstimate" : 0,
                                "works" : 2,
                                "advanced" : 1,
                                "needTime" : 1,
                                "needYield" : 0,
                                "saveState" : 0,
                                "restoreState" : 0,
                                "isEOF" : 0,
                                "invalidates" : 0,
                                "direction" : "forward",
                                "docsExamined" : 1
                        }
                },
                "allPlansExecution" : [ ]
        },
        "serverInfo" : {
                "host" : "Matts-MBP",
                "port" : 27017,
                "version" : "4.0.3",
                "gitVersion" : "7ea530946fa7880364d88c8d8b6026bbc9ffa48c"
        },
        "ok" : 1
}

Now totalDocsExamined was 1, which means it found a result on the first document and because it was limited to finding one document it returned right away instead of searching the rest of the documents.

Use an Index

Another great option is to use an index on the Quantity field.

> db.products.createIndex({Quantity:1})

You should see a response similar to below:

{
        "createdCollectionAutomatically" : true,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

This creates on index on the Quantity field and the 1 parameter will tell it to sort that index in ascending order.

Now let’s see if our performance improved:

db.products.find({Quantity: 3}).explain({verbose: true})

The response will show us that "winningPlan" or the method by which it found the documents was “IXSCAN” which indicates that it used our index instead of checking every document.

On larger datasets you should look at the executionTimeMillis and verify that the query becomes more efficient.

Conclusion

In this tutorial we went through a couple different ways to optimize a query in MongoDB including limiting the query results and creating an index. We introduced the explain() function which will give you vital information on how your query is performing. If you need help improving the efficiency of your database or want us to recommend a more efficient solution please don’t hesitate to reach out to us at Object Rocket.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.