How To Sort MongoDB Documents In A Collection Using Python
Introduction
When a call is made to query documents, MongoDB returns a pymongo.cursor.Cursor
object containing data about the query’s results. This cursor object has an attribute method called sort()
that allows for the sorting of MongoDB documents and the data to be iterated and organized either alphabetically or numerically, making the iteration process more efficient. When sorting MongoDB documents with Python, method calls to the collection class that returns a dictionary object, like find_one()
, will not have any built-in methods like sort()
. Users can test out the following examples for sorting MongoDB documents with PYMongo using Python’s IDLE environment by opening up a terminal or command prompt window and typing “idle3
” for the Python 3 version of IDLE, or just “idle"
for Python 2.
Prerequisites to Sorting MongoDB Documents
- A working knowledge of Python and its syntax.
- The properly installed Python interpreter.
Note: Python 2.7 is deprecated and scheduled to lose long-term support by 2020, As such, Python 3 is recommended for executing the examples in this tutorial.
- The PyMongo Python driver for MongoDB must be installed with the PIP package manager, as shown here:
1 | pip3 install pymongo |
- Use the netcat (
nc
) command, shown below, in a UNIX terminal to confirm that MongoDB is running on port27017
:
1 | nc -v localhost 27017 |
The terminal should output something that resembles the following:
“Connection to localhost port 27017 [tcp/*] succeeded!
“
Press the CTRL+C keys to exit.
How to create a collection instance from PyMongo’s MongoClient class
It is important to import PyMongo’s MongoClient
class at the top of the script to create a collection object of a collection stored on the MongoDB database, as shown here:
1 2 | # import the MongoClient class from PyMongo from pymongo import MongoClient |
Next, create a new client instance of MongoClient
and access a database that has a collection with documents on it, as in this example:
1 2 3 4 5 | # create a client instance of the MongoClient class mongo_client = MongoClient('mongodb://localhost:27017') # create database instance db = mongo_client["sort_example"] |
Now access a collection’s find()
method using the db
object. Pass an empty dictionary ({}
) to the method if you want to query all of the collection’s documents, as shown here:
1 | cursor = db["Sort-Collection"].find( {} ) |
How to find a MongoDB collection with documents on it
Call one of the collection class methods that return pymongo.cursor
objects, like the find()
or find_raw_batches()
methods, and use those objects to filter and query documents.
Calling the PyMongo’s find()
method and have it return a pymongo.cursor.Cursor
object
All cursor objects in PyMongo should work with the sort()
method. Use Python’s hasattr()
function for confirmation, as shown here:
1 | print ("Cursor obj has 'sort()':", hasattr(cursor, "sort")) |
If done correctly, Python should print True
to the terminal or IDLE window.
A list of the MongoDB document examples used in this tutorial
The examples in this tutorial, shown below, are from a collection made up of just six documents that contain "last name"
, "first name"
, "gender"
string fields and an "age"
integer field:
1 2 3 4 5 6 | {'_id': ObjectId('5cff441e72ccde332cf3a213'), 'last name': 'Truk', 'first name': 'Ezekiel', 'gender': 'Male', 'age': 45}, {'_id': ObjectId('5cff445a72ccde332cf3a214'), 'last name': 'Nan', 'first name': 'Clara', 'gender': 'Female', 'age': 23}, {'_id': ObjectId('5cff449e72ccde332cf3a215'), 'last name': 'Shulman', 'first name': 'Corinthian', 'gender': 'Male', 'age': 52}, {'_id': ObjectId('5cff44aa72ccde332cf3a216'), 'last name': 'Erickson', 'first name': 'Janet', 'gender': 'Female', 'age': 36}, {'_id': ObjectId('5cff44b972ccde332cf3a217'), 'last name': 'Atkins', 'first name': 'Carmen', 'gender': 'Female', 'age': 26}, {'_id': ObjectId('5cff44c572ccde332cf3a218'), 'last name': 'Paleozoic', 'first name': 'Josephus', 'gender': 'Male', 'age': '24'} |
How to use the Cursor object’s sort() method to organize MongoDB documents
The sort method doesn’t require that a parameter for the order be passed to it. If just the field name is passed to sort()
, then it will organize the data in ASCENDING
order by default.
Have the find().sort()
call return a cursor object so the documents can be iterated over, as shown here:
1 | cursor = db["Sort-Collection"].find().sort("first name") |
Iterate the cursor object returned by the sort()
filter
The documents returned by MongoDB in the cursor object can be iterated over similar to a Python list using an iterator such as for
, or using the enumerate()
function, as shown here:
1 2 3 4 5 6 | # verify that MongoDB returned a Cursor object if hasattr(cursor, "sort") == True: # iterate over the Cursor obj for docs for num, doc in enumerate(cursor): print (num, "-->", doc, "\n") |
The document returned in each iteration should be a Python dictionary object that has the document’s fields, including its "_id"
as the dict
object’s keys.
Using Python’s IDLE to sort MongoDB documents and iterate over the PyMongo Cursor object
How to pass multiple parameters to the sort()
method to use intersectional sorting and reverse the sort order
When passing more than just the field name string, a tuple object, with two values, must be passed inside of a list. If using more than one sorting parameter, then just pass another tuple inside the list.
The following is an example that sorts the "age"
of the people in descending order (-1
) and also sorts their last names in ascending order (1
):
1 2 3 4 5 | cursor = db["Sort-Collection"].find( {} ).sort( [ ("age", -1), ("last name", 1) ] ) |
How to append MongoDB documents in the cursor object to a regular Python list
The documents returned by the Cursor iterator are just regular Python dict
objects of the document’s contents. These can be inserted into a list by using the += []
operator:
1 2 3 4 5 6 7 8 9 | # create an empty list to store docs sorted_docs = [] # iterate over the Cursor object for MongoDB docs for num, doc in enumerate(cursor): # append the dict object to the list sorted_docs += [ doc ] print ( "total docs:", len(sorted_docs) ) |
The Sort API call will fail to properly order the data if the field in a document is the wrong data type
If just one of the fields in just one of the documents is the wrong data type, e.g. a document has a string-type for an integer field, then the sort()
method will fail to sort properly.
The PyMongo sort()
method failing to sort an integer field
Below is an example of the PyMongo sort()
method failing to sort an integer field because one of the document’s has the wrong data type:
If the sort()
method fails, first examine the document’s integer field data type to confirm that all of the documents have an integer for the data type.
How to use the PyMongo’s update_many()
method
Below is an example of using the PyMongo’s update_many()
method, or MongoDB Compass UI, to update and fix the data:
How to filter an integer field using $gh
and sort the data by that field
The following is an example of how to use the greater-than filter ($gt
) and sort method together:
1 2 3 4 | # only get documents with "age" > 40, and sort them by "age" cursor = db["Sort-Collection"].find( {"age": {"$gt":40}} ).sort( [("age", -1)] ) # sort age by DESCENDING order |
>NOTE: The number representing the second item of the tuple object passed to sort()
is the sort order. The 1
is for ASCENDING
and -1
is for DESCENDING
order.
How to sort MongoDB documents by a field, but return a different field type
A second dictionary parameter can also be passed to the find()
method and have the API call return only that data type, but still have the data sorted by another field name by calling sort()
.
How to return only one data type using sort()
and the second parameter in find()
In the below example, this API call will return just the "first name"
and "_id"
field data of the MongoDB collection, but it will be in descending order using the "age"
integer field’s data:
1 | cursor = db["Sort-Collection"].find( {}, { "first name":1 } ).sort( [("age", -1)] ) |
Conclusion
This tutorial explained how to use the PyMongo sort()
method and how to pass multiple parameters for executing compound filtering and sorting of MongoDB documents. Bear in mind, when sorting MongoDB documents in Python, the sort API call will fail if the field in a document has the wrong data type. The sort method will fail to sort properly if even one of the fields in just one of the documents contains the wrong data type. The most common reason for a failure when sorting MongoDB documents with PYMongo sort method is usually tied to not all of the documents having an integer for the data type. Also remember, the number representing the second item of the tuple object passed to sort()
is the sort order, with 1
for ASCENDING
and -1
for DESCENDING
order.
Just the Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | #!/usr/bin/env python3 #-*- coding: utf-8 -*- # create a client instance of the MongoClient class mongo_client = MongoClient('mongodb://localhost:27017') # create database instance db = mongo_client["sort_example"] # a simple find() API call to get documents cursor = db["Sort-Collection"].find( {} ) # check if Cursor obj has the "sort" attr print ("Cursor obj has 'sort()':", hasattr(cursor, "sort")) # verify that MongoDB returned a Cursor object if hasattr(cursor, "sort") == True: # iterate over the Cursor obj for docs for num, doc in enumerate(cursor): print (num, "-->", doc, "\n") # organize MongoDB documents using two sort parameters # 1 = ASCENDING and -1 = DESCENDING cursor = db["Sort-Collection"].find( {} ).sort( [ ("age", -1), ("last name", 1) # each sort is a tuple ] ) # create an empty list to store docs sorted_docs = [] # iterate over the Cursor object for MongoDB docs for num, doc in enumerate(cursor): # append the dict object to the list sorted_docs += [ doc ] print ( "total docs:", len(sorted_docs) ) |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started