Use Python To Query MongoDB Documents In A Terminal Window

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

If you’re storing data in MongoDB, you may want to query for documents from a Python script. Fortunately, this is a simple task to accomplish. In this article, we’ll show you how to pass MongoDB query strings to a Python script from a terminal window. The script will then return a query response of MongoDB documents. Our example will look for the following documents that have the word “Object” in a field string:

1
2
3
{"_id":"5d36277e024f042ff4837ad5","string field":"Object Rocket","int field":"8764","bool field":false}

{"_id":"5d36279f024f042ff4837ad6","string field":"Object Rocket articles","int field":"1","bool field":true}

Screenshot of some documents in MongoDB Compass

Prerequisites

Let’s take a look at a few prerequisites that should be in place before we move on to the Python script:

  • First, you’ll need to make sure the MongoDB cluster is running. Type mongod in a terminal window to get the current status of the cluster; this will also show you the PID of the service and the port on which it’s running.

  • It’s best to use Python 3 to run the example code in this article since Python 2.7 has been deprecated and scheduled to lose support.

  • Be sure to install the PyMongo client library for MongoDB using the PIP package manager. You can install the library with the following command:

1
pip3 install pymongo

Terminal screenshot checking MongoDB status with mongod and PIP installing PyMongo

  • Finally, you’ll need to have a MongoDB database and collection with a few documents in it that can be queried in your script.

Create a Python script for the MongoDB queries and import PyMongo

At this point, we’re ready to move forward and dive into the code. We’ll first create a Python script using the touch command in a terminal window. In our script, we’ll import the sys Python package which handles the system arguments being passed to the script, and we’ll also import PyMongo’s MongoClient class for the API calls:

1
2
3
4
5
6
7
8
#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# use sys for system arguments passed to script
import sys

# import the MongoClient class from the library
from pymongo import MongoClient

Allow the user to pass the database and collection names as system arguments

We’ll need to get the database and collection names from sys.argv— the list of arguments that gets passed to the script from the terminal window:

1
2
3
4
5
6
7
8
# The script name is the first arg [0], but the second arg [1]
# can be used for the MongoDB database and collection
try:
    DB_NAME = sys.argv[1]
    COL_NAME = sys.argv[2]
except IndexError as err:
    print ("IndexError - Script needs more arguments for MongoDB db and col name:", err)
    quit()

Create a collection instance from the system arguments passed to the Python script

Now that we’ve imported the appropriate packages and processed the arguments, let’s declare a MongoDB client instance. After that, we’ll instantiate a collection object for the API calls using elements from the sys.argv list that was passed to the script:

1
2
3
4
5
6
7
8
9
# create an instance of the MongoDB client for Python
# this example explicitly passes host parameters
client = MongoClient('mongodb://localhost:27017')

# pass the database and collection names to the client instance
db = client[DB_NAME]
col = db[COL_NAME]

print ("MongoDB collection:", col, "\n")

Declare the main() function and check the arguments

A main() function needs to be declared in order to call the function from the terminal and to pass arguments to it. Make sure to check the length of the arguments list to ensure that the correct number were passed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def main():

    # declare variable for system arguments list
    sys_args = sys.argv

    # remove Python script name from args list
    sys_args.pop(0)

    # quit the script if there are not exactly 3 arguments
    if len(sys_args) != 4:
        print ("Four arguments needed. You provided:", sys_args)
        print ("You need to pass the database and collection name,")
        print ("and you need to pass the field name and string or int value.", sys_args)
        quit()

Get the field name and string query arguments

The first and second arguments will contain the database name and collection, so the third [2] and fourth [3] elements in the list can be used to hold the collection’s field name being queried and the string query itself:

1
2
3
4
5
6
    else:
        # MongoDB collection's field name is the 3rd element in args list
        FIELD_NAME = sys_args[2]

        # The string query is the 4th element in args list
        STR_QUERY = sys_args[3]

Create a new index for the MongoDB collection’s field if it doesn’t exist

The $regex pattern operator for MongoDB queries is very resource-intensive, especially when conducted on a collection with huge amounts of data. One way to deal with this is by creating an index to make the regex string query more efficient. Our next step will be to see if an index for the string field exists; if it doesn’t, we’ll create one.

Get all of the indexes in the MongoDB collection and iterate over them

We’ll use the collection instance’s index_information() method to get all of the indexes:

1
2
3
        # get all of the collection's indexes
        indexes = col.index_information()
        print ("\nindex_information() TYPE:", type(indexes))

Iterate over the indexes and look for the MongoDB field being queried

After that, we’ll iterate over all of the collection’s indexes. If the index for the field being queried doesn’t exist, then we’ll create it using PyMongo’s create_index() method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
        # default is to create an index
        create_index = True
        for key, val in indexes.items():

            # check if the field name for query is in an index
            if FIELD_NAME in key:
                create_index = False
                print ("found index:", key)

                # exit the loop if index was found
                break

        # create an index if the field name was not found
        if create_index == True:
            resp = col.create_index([(FIELD_NAME, 1)])
            print ("\ncreate_index() response:", resp)
        else:
            print ("\nindex name:", FIELD_NAME, "found")

Make the API request to query for the MongoDB documents

Now, let’s create a query and pass it to the find() method to return a PyMongo cursor object. If the cursor instance has a count() of zero which indicates that no documents were found, the following code will try the query again using lower, upper, and title case:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
        # pass all of the args to the collection's find() method
        final_query = {FIELD_NAME: {"$regex": STR_QUERY}}
        cursor = col.find(final_query)
        print ("final_query:", final_query)

        # try to get lowercase doc matches
        if cursor.count() == 0:
            # pass all of the args to the collection's find() method
            final_query = {FIELD_NAME: {"$regex": STR_QUERY.lower()}}
            cursor = col.find(final_query)

        # try to get upper() doc matches
        if cursor.count() == 0:
            # pass all of the args to the collection's find() method
            final_query = {FIELD_NAME: {"$regex": STR_QUERY.upper()}}
            cursor = col.find(final_query)

        # try to get title() doc matches
        if cursor.count() == 0:
            # pass all of the args to the collection's find() method
            final_query = {FIELD_NAME: {"$regex": STR_QUERY.title()}}
            cursor = col.find(final_query)

        # check if the cursor returned any results
        if cursor.count() != 0:
            # iterate the cursor object returned by the find() method

Iterate and print the results of the query in the cursor object

1
2
3
4
5
6
7
            print ("cursor TYPE:", type(cursor))
            print ("cursor.count():", cursor.count())
            for doc in cursor:
                print ("\n\n", doc, dir(doc))
                print (doc.items(), "\n")
                print ("_id:", doc["_id"])
                print (FIELD_NAME, ":", doc[FIELD_NAME])

Instruct the Python interpreter to call the main() function

We’re almost done with our script, but there’s a bit of code we’ll need to include at the end. This code allows the Python interpreter to recognize the main() function as the script’s main module and to execute that function when the script gets called from a terminal window:

1
2
3
# have interpreter call the main() func
if __name__ == "__main__":
    main()

Conclusion

As you can see, it’s quite simple to write a script that can query for MongoDB documents with Python. Now that the Python script is complete, the only thing left to do is execute it. In a terminal window, navigate to the script’s location and use the python3 command to run the script. When you call the script, be sure to pass the appropriate arguments to it in order to avoid errors.

Here’s an example of what the command might look like:

1
python3 query.py SomeDatabase "Some Collection" "string field" object

If any of your arguments have spaces in them, be sure to enclose them in quotation marks. Here’s an example where all of the arguments, except for the database name, have a space in them:

1
python3 query.py SomeDatabase "Some Collection" "Field Name" "Query String"

Terminal screenshot a request to query MongoDB documents using a Python script

Just the Code

Although we’ve reviewed our example script one section at a time in this article, it can be helpful to view the code in its entirety. We’ve included the complete script below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# use sys for system arguments passed to script
import sys

# import the MongoClient class from the library
from pymongo import MongoClient

# The script name is the first arg [0], but the second arg [1]
# can be used for the MongoDB database and collection
try:
    DB_NAME = sys.argv[1]
    COL_NAME = sys.argv[2]
except IndexError as err:
    print ("IndexError - Script needs more arguments for MongoDB db and col name:", err)
    quit()

# create an instance of the MongoDB client for Python
# this example explicitly passes host parameters
client = MongoClient('mongodb://localhost:27017')

# pass the database and collection names to the client instance
db = client[DB_NAME]
col = db[COL_NAME]

print ("MongoDB collection:", col, "\n")

def main():

    # declare variable for system arguments list
    sys_args = sys.argv

    # remove Python script name from args list
    sys_args.pop(0)

    # quit the script if there are not exactly 3 arguments
    if len(sys_args) != 4:
        print ("Four arguments needed. You provided:", sys_args)
        print ("You need to pass the database and collection name,")
        print ("and you need to pass the field name and string or int value.", sys_args)
        quit()

    else:
        # MongoDB collection's field name is the 3rd element in args list
        FIELD_NAME = sys_args[2]

        # The string query is the 4th element in args list
        STR_QUERY = sys_args[3]

        # iterate the col object's attributes
        for num, item in enumerate(dir(col)):
            # look for all of the methods with "index"
            if "index" in item:
                # print the collection object's methods and attributes
                print (item)

        # get all of the collection's indexes
        indexes = col.index_information()
        print ("\nindex_information() TYPE:", type(indexes))

        # default is to create an index
        create_index = True
        for key, val in indexes.items():

            # check if the field name for query is in an index
            if FIELD_NAME in key:
                create_index = False
                print ("found index:", key)

                # exit the loop if index was found
                break

        # create an index if the field name was not found
        if create_index == True:
            resp = col.create_index([(FIELD_NAME, 1)])
            print ("\ncreate_index() response:", resp)
        else:
            print ("\nindex name:", FIELD_NAME, "found")

        # pass all of the args to the collection's find() method
        final_query = {FIELD_NAME: {"$regex": STR_QUERY}}
        cursor = col.find(final_query)
        print ("final_query:", final_query)

        # try to get lowercase doc matches
        if cursor.count() == 0:
            # pass all of the args to the collection's find() method
            final_query = {FIELD_NAME: {"$regex": STR_QUERY.lower()}}
            cursor = col.find(final_query)

        # try to get upper() doc matches
        if cursor.count() == 0:
            # pass all of the args to the collection's find() method
            final_query = {FIELD_NAME: {"$regex": STR_QUERY.upper()}}
            cursor = col.find(final_query)

        # try to get title() doc matches
        if cursor.count() == 0:
            # pass all of the args to the collection's find() method
            final_query = {FIELD_NAME: {"$regex": STR_QUERY.title()}}
            cursor = col.find(final_query)

        # check if the cursor returned any results
        if cursor.count() != 0:
            # iterate the cursor object returned by the find() method
            print ("cursor TYPE:", type(cursor))
            print ("cursor.count():", cursor.count())
            for doc in cursor:
                print ("\n\n", doc, dir(doc))
                print (doc.items(), "\n")
                print ("_id:", doc["_id"])
                print (FIELD_NAME, ":", doc[FIELD_NAME])

# have interpreter call the main() func
if __name__ == "__main__":
    main()

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.