How To Access And Parse MongoDB Documents In Python

Introduction

If you’re a Python developer and planning to work with MongoDB, it’s important to know how to handle the results of a query. When you query a collection of documents with PyMongo, you’ll receive results in the form of a Python dictionary. It’s actually quite easy to access these MongoDB documents in Python and examine each one individually. In this article, we’ll learn how to iterate through MongoDB documents that are returned from a query and parse each document’s field and values.

Prerequisites

It’s important to review the list of system requirements needed for this tutorial before we jump into the Python code. There are a few key prerequisites to keep in mind:

  • First, the MongoDB client needs to be installed on the server. You can use the --version command in a terminal to see if it’s installed:
mongo --version
  • Also, Python needs to be installed on the machine or server. Python 3 is recommended, as Python 2 is scheduled for deprecation.

  • Finally, the Python driver for MongoDB needs to be installed. Use the pip3 package manager (or pip if you’re using Python 2) to install the Python driver:

pip3 install pymongo

Connect to the MongoDB Server using the MongoClient class in Python

Now that we’ve reviewed the prerequisites, we’re ready to focus on the code. First, we’ll import the MongoClient class and create a new client instance of the driver:

from pymongo import MongoClient
mongo_client = MongoClient()

We’ll be using the mongo_client object to make method calls to a MongoDB database and its collections.

Access a MongoDB database and collection using the Python client instance

The MongoDB API doesn’t raise any kind of Python AttributeError if the database or collection you specify doesn’t exist.

If either the database or the collection do not exist, they will be implicitly created at the time of an API call. You can verify this in the MongoDB shell.

Access a MongoDB database using the PyMongo client instance

You can access the databases on the MongoDB server as attributes of the client object itself. In the following example, we access the database attribute some_database:

db = mongo_client.some_database

Access the Collection attribute of the MongoDB database instance

Let’s create a new collection object from the MongoDB database instance:

col = db.some_collection

This Collection object (col) will be used going forward to make API requests to the MongoDB server to retrieve the collection’s documents.

Use a Python iterator to access all of a MongoDB collection’s documents

Although a Python iterator isn’t the most efficient way to parse MongoDB documents, it’s a handy way to access documents during development. Here, we use a basic for loop to iterate the MongoDB documents:

print ("\nReturn every document:")
for doc in col.find():
print (doc)

Each document returned by the API call is actually a Python dictionary object, with each one having its own ObjectId attribute as well as any custom field values that were created when the document was inserted.

The document’s key-value pairs can be accessed just like they would in any Python dictionary. For example, to get the document’s _id you just have to access the dictionary’s "_id" key:

ids = [] # create an empty list for IDs
# iterate pymongo documents with a for loop
for doc in col.find():
# append each document's ID to the list
ids += [doc["_id"]]

# print out the IDs
print ("IDs:", ids)
print ("total docs:", len(ids))

Make a list of all the IDs returned in a PyMongo API call to a MongoDB collection

Screenshot of Python IDLE making find() API request to MongoDB and putting _id into Python list

Get all attributes of a MongoDB collection object in Python

You can use Python’s built-in dir() function to return a list of all the collection object’s attributes:

print (dir(col), '\n\n')

You’ll see in the following screenshot that several of the methods begin with the word "find". Any method that starts with "find" can be used to query the collection’s documents.

All of the Python methods used to find documents in a MongoDB collection

Screenshot of Python's IDLE returning all of a MongoDB collection attributes

Use Python’s list() function to return a list of all the MongoDB documents in a collection

A more efficient way to get all of a collection’s documents when you query for MongoDB documents in Python is to use the list() function. You can accomplish this by passing the collection’s find() API call into the list call. This will return a list of MongoDB documents in the form of Python dictionaries.

Here’s an example that uses list() to return the documents:

# get all the documents in a MongoDB collection with list()
documents = list(col.find())

Access the items for each MongoDB document’s dictionary

Once you have the list, you can then iterate the MongoDB documents and access the key-value items (or MongoDB document fields and values) for each dictionary. In this example, we get the document’s unique "_id" as we iterate the Pymongo documents:

# iterate over the document dictionaries in the list
for doc in documents:
# access each document's "_id" key
print ("\ndoc _id:", doc["_id"])

Iterating over MongoDB documents returned by the Collection's find() method inside list()

Find a MongoDB document in Python using the find_one() method

The MongoDB find_one() method in Python can be used to iterate the documents in a MongoDB collection, returning the first document that it encounters.

Unlike the find() method that we discussed earlier, find_one() does not return a pymongo.cursor.Cursor object. Instead, it will return a single document as a Python dictionary that you can access. If the collection doesn’t contain any documents, it will return a Python NoneType object.

Here’s an example of the find_one() API call to a MongoDB collection:

# get a collection object
col = db['some_collection']

# find one document
doc = col.find_one()Ã¥
print ( doc, "-- doc type:", type(doc) )

# get a different collection with NO documents
other_col = db["DOES_NOT_EXIST"]

# find one document and print it
doc = other_col.find_one()
print ( doc, "-- doc type:", type(doc) )

Screenshot of Python IDLE making the find_one() API call to a MongoDB collection

Check if the Cursor object returned by the MongoDB API request is Python “NoneType” object

If the collection being queried is empty, or if it hasn’t been created on the MongoDB server yet, be sure your code is equipped to catch NoneType Python objects returned by the API calls.

Make a find_one() API request to a MongoDB collection:

# get a collection object
col = db['some_collection']

# find one document on the collection
result = other_col.find_one()

One simple way to catch NoneTypes is to use an if statement. Evaluate the object returned by find_one(), and check if the collection returned None:

# check if the find_one() call returns 'None'
if result != None:

# use a try-except to catch KeyErrors
try:
id_found = result["_id"]
print ("The found_one() request returned ID:", id_found)
except KeyError as err:
print ("KeyError ERROR for:", result, "--", err)

else:
print ("Result object for find_one() query returned 'None'")

Check the length of the pymongo.cursor.Cursor object returned by the MongoDB API call

You can also use the list() function in your script to see how many documents were returned when you query for MongoDB documents with Python. A Python list of documents can be created from the pymongo.cursor.Cursor object if the method returns a Cursor object. In this example, we use the find() API call:

# get a collection object
col = db['some_collection']

# return documents for a find() API request
documents = list(col.find())

You can then iterate the MongoDB documents as in previous examples but also check the list’s length:

# no documents were returned
if len(documents) == 0:
print ("The collection", col.name, "did not return any documents.")

# the collection returned some documents
else:
print (col.name, "has documents:")

# iterate the list of documents
for doc in documents:
# print the document IDs
print ("_id:", doc["_id"])

Using the Cursor object’s next() method to iterate over documents in PyMongo

The Cursor object returned by the Find API call has a built-in method called next() that traverses through all of the object’s documents:

# get a collection instance
col = db['some_collection']

# make an API call to get documents
docs = col.find()

# call the next() method to return a document
docs.next()

The next() method should return a Python dictionary of one of the MongoDB documents with its "_id" as the first key:

{'_id': ObjectId('5cda8b3b665444800ad30129'), 'field': 'value'}

How to catch and avoid the StopIteration error raised by the PyMongo Cursor object

If your code raises a StopIteration exception, it means that you reached the end of the Cursor object’s dictionary results.

To handle this, you need to use a try-except block to catch the error, or call the object’s rewind() method to “reset” it.

Use the rewind() method to reset the pymongo.cursor.Cursor object

Let’s try using the rewind() method first. This method literally “rewinds” the cursor, returning it to an unevaluated state:

docs.next()
# returns StopIteration error

docs.rewind()
docs.next()
docs.next()
docs.next()

Use Python’s try and except exception block to catch the StopIteration error returned by the Cursor method

try:
for d in range(1000):
print ("doc:", docs.next())
except StopIteration as err:
print ("StopIteration error:", err, "-- rewinding Cursor object.")
docs.rewind()

In some cases, the Cursor object may not have any documents on it. You can use the Cursor.count() method, but this feature has been deprecated since version 3.1 of MongoDB.

Screenshot of Python IDLE Pymongo returning a StopIteration Cursor object error

Iterate over a PyMongo Cursor object returned by a MongoDB API call

When you make an API request to MongoDB and access a document, you should get a Python document in the following format:

{'_id': ObjectId('5cda8b3b665444800ad30129'), 'field': 'value'}

Iterate over the MongoDB result object’s items

This object that is returned by the Cursor object is a standard Python dictionary that you can iterate through using the items() method. (If you’re using Python 2, make sure to iterate through the key-value pairs with the some_dict.iteritems() method).

Here’s a simple example that shows how to iterate over a document dictionary’s key-value pairs:

# iterate the MongoDB result dict in Python 3
for key, value in docs.next().items():
print ("key:", key, "-- value:", value)

Conclusion

One of the most important aspects of working with MongoDB is being able to handle the results returned from a query. Fortunately, there are many ways to access and parse a collection of documents with PyMongo. With the step-by-step instructions provided in this article, you’ll be ready to work with the results returned from any kind of MongoDB query in Python.

Throughout this tutorial, we reviewed the example code one section at a time. Here’s the complete Python script for your use:

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the MongoClient class
from pymongo import MongoClient

# build a new client instance from the MongoClient class
mongo_client = MongoClient()

# create MongoDB database and collection instances
db = mongo_client.some_database
col = db.some_collection

"
RETURN ALL DOCUMENTS IN A
MONGODB COLLECTION
"

print ("\nReturn every document:")
for doc in col.find():
print (doc)


"
GET ALL IDS IN A MONGODB COLLECTION
AND PUT THE IDS INTO A LIST
"

ids = [] # create an empty list for IDs
for doc in col.find():
# append each document's ID to the list
ids += [doc["_id"]]

# print out the IDs
print ("IDs:", ids)
print ("total docs:", len(ids))


"
GET ALL DOCUMENTS FROM A FIND()
CALL AND PUT THEM IN A LIST
"

# get all the documents in a MongoDB collection with list()
documents = list(col.find())

# iterate over the document dictionaries in the list
for doc in documents:
# access each document's "_id" key
print ("\ndoc _id:", doc["_id"])


"
USE THE FIND_ONE() METHOD TO RETURN
JUST ONE DOCUMENT IN A MONGODB
COLLECTION
"

# find one document in the collection
result = col.find_one()
# check if the find_one() call returns 'None'
if result != None:

# use a try-except to catch KeyErrors
try:
id_found = result["_id"]
print ("The found_one() request returned ID:", id_found)
except KeyError as err:
print ("KeyError ERROR for:", result, "--", err)

else:
print ("Result object for find_one() query returned 'None'")


"
USE THE NEXT() METHOD TO HAVE
THE CURSOR OBJECT RETURN JUST
ONE DOCUMENT
"

# make an API call to get documents
docs = col.find()

try:
for d in range(1000):
print ("doc:", docs.next())
except StopIteration as err:
print ("StopIteration error:", err, "-- rewinding Cursor object.")
docs.rewind()

Pilot the ObjectRocket Platform Free!

Try Fully-Managed Redis,
MongoDB & Elasticsearch

Get Started

OR

Try CockroachDB
in Beta

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.