How To Access And Parse MongoDB Documents In Python
Introduction
If you’re a Python developer and planning to work with MongoDB, it’s important to know how to handle the results of a query. When you query a collection of documents with PyMongo, you’ll receive results in the form of a Python dictionary. It’s actually quite easy to access these MongoDB documents in Python and examine each one individually. In this article, we’ll learn how to iterate through MongoDB documents that are returned from a query and parse each document’s field and values.
Prerequisites
It’s important to review the list of system requirements needed for this tutorial before we jump into the Python code. There are a few key prerequisites to keep in mind:
- First, the MongoDB client needs to be installed on the server. You can use the
--version
command in a terminal to see if it’s installed:
1 | mongo --version |
Also, Python needs to be installed on the machine or server. Python 3 is recommended, as Python 2 is scheduled for deprecation.
Finally, the Python driver for MongoDB needs to be installed. Use the
pip3
package manager (orpip
if you’re using Python 2) to install the Python driver:
1 | pip3 install pymongo |
Connect to the MongoDB Server using the MongoClient class in Python
Now that we’ve reviewed the prerequisites, we’re ready to focus on the code. First, we’ll import the MongoClient
class and create a new client instance of the driver:
1 2 | from pymongo import MongoClient mongo_client = MongoClient() |
We’ll be using the mongo_client
object to make method calls to a MongoDB database and its collections.
Access a MongoDB database and collection using the Python client instance
The MongoDB API doesn’t raise any kind of Python AttributeError
if the database or collection you specify doesn’t exist.
If either the database or the collection do not exist, they will be implicitly created at the time of an API call. You can verify this in the MongoDB shell.
Access a MongoDB database using the PyMongo client instance
You can access the databases on the MongoDB server as attributes of the client object itself. In the following example, we access the database attribute some_database
:
1 | db = mongo_client.some_database |
Access the Collection attribute of the MongoDB database instance
Let’s create a new collection object from the MongoDB database instance:
1 | col = db.some_collection |
This Collection object (col
) will be used going forward to make API requests to the MongoDB server to retrieve the collection’s documents.
Use a Python iterator to access all of a MongoDB collection’s documents
Although a Python iterator isn’t the most efficient way to parse MongoDB documents, it’s a handy way to access documents during development. Here, we use a basic for
loop to iterate the MongoDB documents:
1 2 3 | print ("\nReturn every document:") for doc in col.find(): print (doc) |
Each document returned by the API call is actually a Python dictionary object, with each one having its own ObjectId
attribute as well as any custom field values that were created when the document was inserted.
The document’s key-value pairs can be accessed just like they would in any Python dictionary. For example, to get the document’s _id
you just have to access the dictionary’s "_id"
key:
1 2 3 4 5 6 7 8 9 | ids = [] # create an empty list for IDs # iterate pymongo documents with a for loop for doc in col.find(): # append each document's ID to the list ids += [doc["_id"]] # print out the IDs print ("IDs:", ids) print ("total docs:", len(ids)) |
Make a list of all the IDs returned in a PyMongo API call to a MongoDB collection
Get all attributes of a MongoDB collection object in Python
You can use Python’s built-in dir()
function to return a list of all the collection object’s attributes:
1 | print (dir(col), '\n\n') |
You’ll see in the following screenshot that several of the methods begin with the word "find"
. Any method that starts with "find"
can be used to query the collection’s documents.
All of the Python methods used to find documents in a MongoDB collection
Use Python’s list() function to return a list of all the MongoDB documents in a collection
A more efficient way to get all of a collection’s documents when you query for MongoDB documents in Python is to use the list()
function. You can accomplish this by passing the collection’s find()
API call into the list call. This will return a list of MongoDB documents in the form of Python dictionaries.
Here’s an example that uses list()
to return the documents:
1 2 | # get all the documents in a MongoDB collection with list() documents = list(col.find()) |
Access the items for each MongoDB document’s dictionary
Once you have the list, you can then iterate the MongoDB documents and access the key-value items (or MongoDB document fields and values) for each dictionary. In this example, we get the document’s unique "_id"
as we iterate the Pymongo documents:
1 2 3 4 | # iterate over the document dictionaries in the list for doc in documents: # access each document's "_id" key print ("\ndoc _id:", doc["_id"]) |
Find a MongoDB document in Python using the find_one() method
The MongoDB find_one()
method in Python can be used to iterate the documents in a MongoDB collection, returning the first document that it encounters.
Unlike the find()
method that we discussed earlier, find_one()
does not return a pymongo.cursor.Cursor
object. Instead, it will return a single document as a Python dictionary that you can access. If the collection doesn’t contain any documents, it will return a Python NoneType
object.
Here’s an example of the find_one()
API call to a MongoDB collection:
1 2 3 4 5 6 7 8 9 10 11 12 13 | # get a collection object col = db['some_collection'] # find one document doc = col.find_one()Ã¥ print ( doc, "-- doc type:", type(doc) ) # get a different collection with NO documents other_col = db["DOES_NOT_EXIST"] # find one document and print it doc = other_col.find_one() print ( doc, "-- doc type:", type(doc) ) |
Check if the Cursor object returned by the MongoDB API request is Python “NoneType” object
If the collection being queried is empty, or if it hasn’t been created on the MongoDB server yet, be sure your code is equipped to catch NoneType
Python objects returned by the API calls.
Make a find_one()
API request to a MongoDB collection:
1 2 3 4 5 | # get a collection object col = db['some_collection'] # find one document on the collection result = other_col.find_one() |
One simple way to catch NoneTypes
is to use an if
statement. Evaluate the object returned by find_one()
, and check if the collection returned None
:
1 2 3 4 5 6 7 8 9 10 11 12 | # check if the find_one() call returns 'None' if result != None: # use a try-except to catch KeyErrors try: id_found = result["_id"] print ("The found_one() request returned ID:", id_found) except KeyError as err: print ("KeyError ERROR for:", result, "--", err) else: print ("Result object for find_one() query returned 'None'") |
Check the length of the pymongo.cursor.Cursor
object returned by the MongoDB API call
You can also use the list()
function in your script to see how many documents were returned when you query for MongoDB documents with Python. A Python list of documents can be created from the pymongo.cursor.Cursor
object if the method returns a Cursor object. In this example, we use the find()
API call:
1 2 3 4 5 | # get a collection object col = db['some_collection'] # return documents for a find() API request documents = list(col.find()) |
You can then iterate the MongoDB documents as in previous examples but also check the list’s length:
1 2 3 4 5 6 7 8 9 10 11 12 | # no documents were returned if len(documents) == 0: print ("The collection", col.name, "did not return any documents.") # the collection returned some documents else: print (col.name, "has documents:") # iterate the list of documents for doc in documents: # print the document IDs print ("_id:", doc["_id"]) |
Using the Cursor object’s next() method to iterate over documents in PyMongo
The Cursor object returned by the Find API call has a built-in method called next()
that traverses through all of the object’s documents:
1 2 3 4 5 6 7 8 | # get a collection instance col = db['some_collection'] # make an API call to get documents docs = col.find() # call the next() method to return a document docs.next() |
The next()
method should return a Python dictionary of one of the MongoDB documents with its "_id"
as the first key:
1 | {'_id': ObjectId('5cda8b3b665444800ad30129'), 'field': 'value'} |
How to catch and avoid the StopIteration
error raised by the PyMongo Cursor object
If your code raises a StopIteration
exception, it means that you reached the end of the Cursor object’s dictionary results.
To handle this, you need to use a try-except block to catch the error, or call the object’s rewind()
method to “reset” it.
Use the rewind()
method to reset the pymongo.cursor.Cursor
object
Let’s try using the rewind()
method first. This method literally “rewinds” the cursor, returning it to an unevaluated state:
1 2 3 4 5 6 7 | docs.next() # returns StopIteration error docs.rewind() docs.next() docs.next() docs.next() |
Use Python’s try
and except
exception block to catch the StopIteration
error returned by the Cursor method
1 2 3 4 5 6 | try: for d in range(1000): print ("doc:", docs.next()) except StopIteration as err: print ("StopIteration error:", err, "-- rewinding Cursor object.") docs.rewind() |
In some cases, the Cursor object may not have any documents on it. You can use the Cursor.count()
method, but this feature has been deprecated since version 3.1 of MongoDB.
Iterate over a PyMongo Cursor object returned by a MongoDB API call
When you make an API request to MongoDB and access a document, you should get a Python document in the following format:
1 | {'_id': ObjectId('5cda8b3b665444800ad30129'), 'field': 'value'} |
Iterate over the MongoDB result object’s items
This object that is returned by the Cursor object is a standard Python dictionary that you can iterate through using the items()
method. (If you’re using Python 2, make sure to iterate through the key-value pairs with the some_dict.iteritems()
method).
Here’s a simple example that shows how to iterate over a document dictionary’s key-value pairs:
1 2 3 | # iterate the MongoDB result dict in Python 3 for key, value in docs.next().items(): print ("key:", key, "-- value:", value) |
Conclusion
One of the most important aspects of working with MongoDB is being able to handle the results returned from a query. Fortunately, there are many ways to access and parse a collection of documents with PyMongo. With the step-by-step instructions provided in this article, you’ll be ready to work with the results returned from any kind of MongoDB query in Python.
Throughout this tutorial, we reviewed the example code one section at a time. Here’s the complete Python script for your use:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | #!/usr/bin/env python3 #-*- coding: utf-8 -*- # import the MongoClient class from pymongo import MongoClient # build a new client instance from the MongoClient class mongo_client = MongoClient() # create MongoDB database and collection instances db = mongo_client.some_database col = db.some_collection " RETURN ALL DOCUMENTS IN A MONGODB COLLECTION " print ("\nReturn every document:") for doc in col.find(): print (doc) " GET ALL IDS IN A MONGODB COLLECTION AND PUT THE IDS INTO A LIST " ids = [] # create an empty list for IDs for doc in col.find(): # append each document's ID to the list ids += [doc["_id"]] # print out the IDs print ("IDs:", ids) print ("total docs:", len(ids)) " GET ALL DOCUMENTS FROM A FIND() CALL AND PUT THEM IN A LIST " # get all the documents in a MongoDB collection with list() documents = list(col.find()) # iterate over the document dictionaries in the list for doc in documents: # access each document's "_id" key print ("\ndoc _id:", doc["_id"]) " USE THE FIND_ONE() METHOD TO RETURN JUST ONE DOCUMENT IN A MONGODB COLLECTION " # find one document in the collection result = col.find_one() # check if the find_one() call returns 'None' if result != None: # use a try-except to catch KeyErrors try: id_found = result["_id"] print ("The found_one() request returned ID:", id_found) except KeyError as err: print ("KeyError ERROR for:", result, "--", err) else: print ("Result object for find_one() query returned 'None'") " USE THE NEXT() METHOD TO HAVE THE CURSOR OBJECT RETURN JUST ONE DOCUMENT " # make an API call to get documents docs = col.find() try: for d in range(1000): print ("doc:", docs.next()) except StopIteration as err: print ("StopIteration error:", err, "-- rewinding Cursor object.") docs.rewind() |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started