How to Query MongoDB Documents with Regex in Python

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

When you query a database, you’re not always looking for an exact string match. You might be querying a collection of store inventory for all items that have “Cookies” as some part of their name, or perhaps you’re searching for a person’s last name that begins with “Sch”, though you’re not sure exactly how the name is spelled. In these types of situations, regular expressions can be used to allow for wildcards and partial matches in queries. The PyMongo "$regex" query for MongoDB follows some of the same standards and patterns as regular expressions. In this article, we’ll explain how to use a regex query for MongoDB documents in Python.

Prerequisites

Before we can look at some sample "$regex" queries, it’s important to go over the prerequisites for this task. There are a few system requirements to keep in mind:

  • You’ll need to install and run the MongoDB server on the same server or machine where your Python scripts are running.

  • If you’re not sure whether Python 3.4 (or above) is installed, open a terminal or command prompt window and type idle3 or python3 and press Return. Press CTRL+Z to exit the Python interpreter.

  • You’ll need to have the PIP3 package manager for Python 3 installed, because you’ll be installing the necessary modules using the pip3 command.

  • You’ll need to install the PyMongo driver for MongoDB using pip3:

1
pip3 install pymongo
  • You’ll also need to install the BSON library for MongoDB’s BSON objects using pip3:
1
pip3 install bson
  • Finally, you’ll need to be somewhat familiar with Python’s syntax to follow along with the examples in this tutorial. Keep in mind that all examples in this article are designed to use Python 3 over Python 2.7.

Access a MongoDB collection with documents to update

Now that we’ve set up everything we need, we can focus on the Python code. Let’s begin by importing the PyMongo library into our Python script and creating a new client instance of the driver:

1
2
3
4
5
# import the MongoClient class of the PyMongo library
from pymongo import MongoClient

# create a client instance of the MongoClient class
mongo_client = MongoClient('mongodb://localhost:27017')

Next, we can use that new MongoDB client instance to access a database and one of its collections:

1
2
3
# create database and client instances
db = mongo_client.some_database
col = db["some_collection"]

Let’s make sure the collection we accessed has documents in it by using the count_documents() method. Just be sure to pass an empty dictionary to the method ({}) to find all of the collection’s documents:

1
2
3
# get the collection's total documents
total_docs = col.count_documents({})
print (col.name, "contains", total_docs, "documents.")

Use the $regex operator with any PyMongo query method

You can use regex patterns with all the different PyMongo query methods, which include:

1
2
3
4
find() # find docs
find_one() # find just one doc
find_one_and_replace() # find one and replace content
count_documents() # return integer count of docs that query matches

Your choice of method to use with your "$regex" query will depend on the task at hand– whether you want to find a single document or multiple documents, whether you want to simply find documents or find them and replace content, and so on.

Structure of PyMongo’s "$regex" query

When you do a method call using the "$regex" operator for PyMongo, you need to pass a query dictionary with another dictionary nested inside it. One of the keys inside the nested “inner” dictionary must be "$regex" (for example: { {"$regex" : "} }). The field data type, or key, should be the key for the outer dictionary (e.g. {"field" : {} }).

If you use the optional "$options" key, then it must be nested inside the inner dictionary along with "$regex".

This nested dictionary structure may sound complicated, but it’s far easier to understand when you see a real example. The code below shows how a typical nested query dictionary should look:

1
2
3
4
5
6
query = {
"field_name": {
"$regex": 'SEARCH FOR THIS',
"$options" :'i' # case-insensitive
}
}

The field name passed in must exactly match the field name of the documents in the collection, otherwise the query won’t return any matches.

MongoDB document data used in this article

The image below shows the sample data that we’re using for the Regex queries throughout this articles:

Screenshot of MongoDB Compass with collection documents for ObjectRocket article

The Regex examples in this article query data where the field name "field" contains variations on the value "ObjectRocket". All of these examples will use the count_documents() query method.

Create a case-sensitive $regex pattern in PyMongo

Let’s look at an example of how to create a case-sensitive MongoDB Regex query using PyMongo:

1
2
3
# use $regex to find docs that start with case-sensitive "obje"
query = { "field": { "$regex": 'obje.*' } }
docs = col.count_documents( query )

The .* included at the end of the "$regex" key’s value acts as a wildcard along with the string match. In this particular example, 0 documents are returned because all 4 of the documents have Object in them, and not obje.

Print out the results of the (.*) case-sensitive "$regex " pattern query

1
2
3
# print the results
print ("query:", query)
print ("$regex using '.___*' -- total:", docs, "\n")

Create a $regex query in PyMongo for an exact match

For an exact string match, just put the specified string between the ^ and $ characters:

1
2
3
# the query between the ^ and $ char is for finding exact matches
query = { "field": { "$regex": '^ObjectRocket 2$' } }
docs = col.count_documents( query )

This query returns 1 document because there is a single document in the collection with ObjectRocket 2 as its "field" value.

Print out the results of the (^$) exact-match "$regex " query

1
2
print ("query:", query)
print ("$regex using '^___$' -- total:", docs, "\n")

Create a case-insensitive $regex query in PyMongo using $options

In the following example, we’ll create an "$options" key, in addition to our "$regex", and we’ll set the value of this key to "i". This will execute a case-insensitive "$regex" query in MongoDB:

1
2
3
# use $options:'i' to make the query case-insensitive
query = { "field": { "$regex": 'oBjEcT', "$options" :'i' } }
docs = col.count_documents( query )

This query returns all 4 of the documents, even though the mixed-case string 'oBjEcT' is passed to it.

Print out the results of "$regex " query using the case-insensitive argument for "$options"

We can print out the results of our query to prove that it worked as expected:

1
2
print ("query:", query)
print ("$regex using $options 'i' -- total:", docs, "\n")

Create a Regex query without using the $regex operator

The next example shows a basic Regex query that behaves just like the case-sensitive one we looked at earlier:

1
2
3
# making Regex query without the '$regex' operator
query = { "field": 'Object Rocket 222' }
docs = col.count_documents( query )

It returns the one document that has 'Object Rocket 222' as its value.

Print out the results of a MongoDB query that omits the "$regex" operator

Once again, we can print out the results of our query to verify that it worked correctly:

1
2
print ("query:", query)
print ("exact match without '$regex' -- total:", docs, "\n")

Conclusion

When you want to incorporate wildcards or partial matches in your MongoDB queries, it’s important to have a solid understanding of regular expressions. The PyMongo driver makes it easy to add string patterns to your queries using the "$regex" operator. With the instructions and examples provided in this article, you’ll have no trouble creating a Regex query for MongoDB documents in Python.

Screenshot of the Python script making $regex queries for ObjectRocket data

Just the Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the MongoClient class of the PyMongo library
from pymongo import MongoClient

# create a client instance of the MongoClient class
mongo_client = MongoClient('mongodb://localhost:27017')

# create database and collection instances
db = mongo_client.some_database
col = db["some_collection"]

# get the collection's total documents
total_docs = col.count_documents({})
print (col.name, "contains", total_docs, "documents.")

"
MAKE REGEX QUERIES TO FIND MONGODB DOCUMENTS WITH
SPECIFIC PATTERN MATCHES IN THE DOCUMENT BODY

find() # find docs
find_one() # find just one doc
find_one_and_replace() # find one and replace content
count_documents() # return integer count of docs that query matches
"


# use $regex to find docs that start with case-sensitive letter "obje"
query = { "field": { "$regex": 'obje.*' } }
docs = col.count_documents( query )
print ("query:", query)
print ("$regex using '.___*' -- total:", docs, "\n")

# the query between the ^ and $ char are for finding exact matches
query = { "field": { "$regex": '^ObjectRocket 2$' } }
docs = col.count_documents( query )
print ("query:", query)
print ("$regex using '^___$' -- total:", docs, "\n")

# use $options:'i' to make the query case-insensitive
query = { "field": { "$regex": 'oBjEcT', "$options" :'i' } }
docs = col.count_documents( query )
print ("query:", query)
print ("$regex using $options 'i' -- total:", docs, "\n")

# making Regex query without the '$regex' operator
query = { "field": 'Object Rocket 222' }
docs = col.count_documents( query )
print ("query:", query)
print ("exact match without '$regex' -- total:", docs, "\n")

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.