How To Bulk Insert Multiple Documents To A MongoDB Collection Using Python

Introduction to insertMany() in PyMongo

If you’re using MongoDB to store data, there will be times when you want to insert multiple documents into a collection at once. It’s easy to accomplish this task with Python when you use the collection.insert_many() method. This useful method is PyMongo’s version of the insertMany() method; both methods can be used to insert more than one document at a time. In this article, we’ll show how to use the PyMongo bulk insert method to index MongoDB documents in Python.

Prerequisites

Before we look at any Python code, it’s important to review the prerequisites for the task. There are a few key system requirements to consider:

  • It’s helpful to have a working knowledge of Python and its syntax. The examples in this article will use containers such as iterators and generators, so you’ll need to be comfortable with these concepts.

  • You’ll need to have Python installed. Python 3 is recommended since Python 2 is now deprecated.

  • You’ll also need to install the PyMongo low-level Python driver for MongoDB using the PIP package manager:

pip3 install pymongo
  • If you haven’t done so already, you’ll need to install the MongoDB server application. You can check to see what version of MongoDB is running by using the mongo --version command in a terminal; you could also enter the Mongo shell in a terminal by typing mongo and pressing Return.

  • You will need to have a MongoDB collection you can use to test your PyMongo API calls. Since you’ll be bulk inserting data, it’s best to stick to a test collection where there are no concerns about possible data loss or corruption.

Import Modules and Connect to the MongoDB Server using the MongoClient class

Once you’ve made sure that all the system requirements are in place, it’s time to dive into the Python code. The first step is to import the MongoClient class, and create a new client instance of the driver:

from pymongo import MongoClient
mongo_client = MongoClient()

We’ll be using the mongo_client object to make method calls to the MongoDB database and its collections

At this point, you can also import any other Python libraries or modules that may be necessary; for example, we’ll import the JSON library for Python that can be used to create JSON strings or to parse JSON strings using its json.loads() method:

# these libraries are optional
import json
import datetime # for MongoDB timestamps
import uuid # UUIDs for documents
import random # to randomly generate doc data

All of the libraries shown above come included with the Python installation, so there’s no need to install any of them with PIP.

Using Python dictionaries in PyMongo

In Python, a dictionary object acts as the equivalent of a JSON or BSON MongoDB document. This basic data structure can easily be created using curly brackets {}, as in the following example:

simple_dict = {"key": "value"}

A Python dictionary is made up of key-value pairs of data. To access a specific key in a Python dictionary, you simply call the object name followed by square brackets [] containing the key’s name in string form. In this example, our dictionary is named "simple_dict", and the specific key we’re accessing is named "key". We store the value of that key in "some_value":

some_value = simple_dict["key"]

# should print out --> "key: value"
print ("key:", some_value)

Constructing a MongoDB document using a Python dictionary

Every key-value pair in a Python dictionary object can translate to a MongoDB field-value pair when it’s used to index documents. Much like a JSON object, the Python dictionary uses commas (,) to separate the key-value pairs.

Dictionaries can be constructed on a single line, like in the following example:

doc_obj = {"field 1": "value 1", "field 2": "value 2"}

If the dictionary is going to be quite long, or if it contains lengthy strings, it’s helpful to use multiple lines when declaring the key-value pairs:

doc_obj = {
    "field 1": "value 1",
    "field 2": "value 2",
    "field 3": 12345
}

Adding the MongoDB documents to a list to be inserted later

The PyMongo insert_one() method only needs a single dict type object be passed to it during the API call, but the insert_many() method requires that the dictionary objects be nested inside a Python list ([]) for the method call to work. Just passing a single dictionary to the method will result in a TypeError exception. Here’s an example of what not to do:

col.insert_many( {"field 1": 42} ) # this won't work

PyMongo raises a TypeError exception if a Python list isn’t passed to the insert_many() method

Screenshot of PyMongo raising a TypeError because a Python dict was passed to the insert_many() method

Put the MongoDB document dictionary objects into a Python list

There are different ways to append, index, or put items into a Python dictionary. Our next examples will illustrate these different options.

Use the += operator to add an item to the MongoDB document list

You can use the += operator to append an element to a list object. Be sure to enclose the MongoDB document object in square brackets, as seen in the example below:

mongo_docs = []
doc_body = {"field":"value"}
mongo_docs += [doc_body]

Use the Python List methods append() and insert() to add MongoDB documents

The append() method for List-type objects can also be used to add an element to the end of a Python list:

doc_body = {"field":"value"}
mongo_docs.append(doc_body)

The insert() method (which is not to be confused with the MongoDB Collection’s insert() method), however, is a bit different from the two previous methods we saw. With insert(), you can specify the position in the list where you want to insert the item. This method requires two arguments to be passed—- the first must be an integer representing the index location where the object will be inserted, and the second is the object itself:

doc_body = {"field":"value"}

# insert a dict object named 'doc_body' at index 3
mongo_docs.insert(3, doc_body)

If the integer value passed to the insert() method’s index parameter exceeds the number of elements in the list, then the object will simply be indexed as the last element in the list.

Declare the MongoDB documents and the list at the same time

In addition to all of the options we’ve just reviewed, another option is to declare the MongoDB document dictionaries and the list at the same time. To accomplish this, we just pass it all into the method call at once, as shown below:

collection.insert_many([{"test": 22345}])

The list object is now ready to be passed to the PyMongo insert many method.

Use an iterator to parse multiple MongoDB documents and append them to a list

Now that we’ve seen how to add a MongoDB document to a list, let’s look at an example of how to iterate over an existing list of values in order to create another Python list comprised of MongoDB documents.

We’ll assume that there was a list of various fruits in the form of a Python list, and these fruits needed to be inserted into a MongoDB collection as documents:

fruits = ["Apple", "Banana", "Mango"]

Iterate over the Python list to create dictionary objects for the MongoDB documents

First, we’ll need to create another empty list. This will serve both as the container for the documents, and as the parameter to be passed in the method call:

# create a new list for the insert_many() method call
mongo_docs = []

Next, we’ll use that list of fruits to build a new list of MongoDB dictionary objects by creating a new dictionary object with each iteration. We could just use a simple for loop, but the enumerate() function is more efficient:

# iterate over the list of fruits
for num, fruit in enumerate( fruits ):
    # create a new MongoDB document dict
    doc = {}

Pass objects, like datetime.now(), to the MongoDB dictionary document

You can add more key-value pairs to a Python dictionary to give the MongoDB documents more content. In the following example, we’ll create a random integer (using the random.randint() method call) to generate some additional data for the fruit. We’ll use a random choice of either 0 or 1 to determine if each fruit is labeled as “good” or “bad, then we’ll use the value of datetime.now() to create a timestamp for the fruit. Finally, we’ll create a UUID for each fruit:

    # randomly pick num between 0 and 1
    ran_num = random.randint(0, 1)
    doc['condition'] = ['good', 'bad'][ran_num]

    # create a time stamp for the document
    doc['time'] = datetime.datetime.now()

    # creat a UUID for the fruit
    doc['uuid'] = str( uuid.uuid4() )

The final step is to put the dictionary object into the mongo_docs list that we created earlier, and then pass that list to the insert_many() method once we finish iterating over the fruits list:

    # add the MongoDB document to the list
    mongo_docs += [doc]

# make an API request to MongoDB to insert_many() fruits
col.insert_many( mongo_docs )

Iterate over a list of items, call the PyMongo bulk index method insert_many(), and verify the inserted documents in MongoDB Compass

Screenshot of Python inserting many documents into MongoDB created from a list iteration

Create a MongoDB document list from a JSON string

If your script is running on the server side of a website, a more common way to build a list of MongoDB dictionary objects would be by parsing an HTTP message string. The Python JSON library makes this task simple, as we’ll see in the next example.

Inserting MongoDB documents in the form of an HTTP Message JSON string from a POST request

For this example, we’ll assume the client made an HTTP request to the MongoDB server and the message request came in the form of a JSON string, which will be indexed as MongoDB documents:

http_message_string = '''
{"db": "some_database", "col": "some_collection", "docs": [{"body": {"field 1": "value 1", "field 2": "value 2"}}, {"body": {"another field": "another val", "another field 2": "another val 2"}}]}
'''

At first, this string looks a bit difficult to parse. Fortunately, the json.loads() method call can convert that JSON string into a Python dictionary that can then be inserted into a MongoDB collection:

# try to create a dict from JSON string
http_json = json.loads(http_message_string)

If the method call to json.loads() doesn’t raise any exceptions, it means that the returned object is a Python dictionary. This dictionary can then be parsed and finally passed to the PyMongo index_many() method:

# use iteritems() in Python 2 instead
for key, val in http_json.items():

    # the value must be a list for insert_many() to work
    if key == "docs" and type(val) == list:
        collection.insert_many( val )

Using a JSON string and PyMongo to insert_many() documents, and then verify the insertion with MongoDB Compass

Screenshot of Python IDLE parsing a JSON string and using it to make an API request to MongoDB to insert documents

Have the PyMongo API call return a “results” object for the inserted documents

The PyMongo API call can also return a pymongo.results.InsertManyResult object. This helps you verify that the insertion was successful, and gives you more information on what changes were made to the MongoDB collection.

To get this information, have the method call return an object when calling it:

# make an API call to insert multiple documents
# and have it return a 'results' object
result = col.insert_many( mongo_docs )

Then, take the returned result object and access its various attributes. One attribute is the inserted_ids list, which lists all the IDs that were created for the documents upon insertion. Use Python’s len() function to see the total number of documents inserted:

# get the total numbers of docs inserted
total_docs = len(result.inserted_ids)

print ("total inserted:", total_docs)
print ("inserted IDs:", result.inserted_ids)

Conclusion

When you need to insert a batch of documents into a MongoDB collection, creating a Python script is a good way to get the job done. The PyMongo driver makes it easy to bulk insert MongoDB documents with Python. With the step-by-step instructions provided in this article, you’ll have no trouble performing a MongoDB bulk insert in a Python script.

Throughout this article, we examined the code one section at a time. The following Python script contains all of the examples in this article and can be used to insert_many() MongoDB documents into a collection:

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the MongoClient class
from pymongo import MongoClient

# these libraries are optional
import json
import datetime # for MongoDB timestamps
import uuid # UUIDs for documents
import random # to randomly generate doc data


"""
CREATE A NEW CLIENT INSTANCE AND DECLARE THE
DATABASE, COLLECTION INSTANCES
"""

# create a MongoDB client instance
mongo_client = MongoClient('localhost', 27017)
db = mongo_client.fruits
col = db.my_fruit


"""
ITERATE OVER A LIST OF STRINGS AND CREATE
MONGODB DOCUMENTS AND PUT THEM IN A LIST FOR
THE insert_many() API CALL
"""

# fruit list for new MongoDB documents
fruits = ["Apple", "Banana", "Mango"]

# create a new list for the insert_many() method call
mongo_docs = []

# iterate over the list of fruits
for num, fruit in enumerate( fruits ):
    # create a new MongoDB document dict
    doc = {}
   
    # randomly pick num between 0 and 1
    ran_num = random.randint(0, 1)
    doc['condition'] = ['good', 'bad'][ran_num]

    # create a time stamp for the document
    doc['time'] = datetime.datetime.now()

    # create a UUID for the fruit
    doc['uuid'] = str( uuid.uuid4() )

    # add the MongoDB document to the list
    mongo_docs += [doc]

# make an API request to MongoDB to insert_many() fruits
result = col.insert_many( mongo_docs )

# get the total numbers of docs inserted
total_docs = len(result.inserted_ids)

print ("total inserted:", total_docs)
print ("inserted IDs:", result.inserted_ids, "\n\n")


"""
CREATE A DICT OBJECT FROM A JSON STRING
AND INSERT THE DOCUMENTS CONTAINED IN THE
JSON STRING
"""


# simulate an HTTP POST request in the form of a JSON string
http_message_string = '''
{"db": "some_database", "col": "some_collection", "docs": [{"body": {"field 1": "value 1", "field 2": "value 2"}}, {"body": {"another field": "another val", "another field 2": "another val 2"}}]}
'''


# try to create a dict object from JSON string
try:
    http_json = json.loads(http_message_string)
except Exception as err:
    print ("json.loads ERROR:", err)
    http_json = {}

print ("http_json:", http_json, "\n")

# check if JSON string has MongoDB db and col names
if ("db" in http_json) and ("col" in http_json):

    # get the MongoDB db and col names from JSON obj
    db = mongo_client[http_json["db"]]
    col = db[http_json["col"]]    
   
    # use iteritems() in Python 2 instead
    for key, val in http_json.items():
       
        # the value must be a list for insert_many() to work
        if key == "docs" and type(val) == list:
           
            # make MongoDB API call using insert_many()
            result = col.insert_many( val )

            # get the total numbers of docs inserted
            total_docs = len(result.inserted_ids)

            print ("total inserted:", total_docs)
            print ("inserted IDs:", result.inserted_ids)            

else:
    print ("JSON string was not valid, or didn't have the proper credentials")

Pilot the ObjectRocket Platform Free!

Try Fully-Managed Redis,
MongoDB & Elasticsearch

Get Started

OR

Try CockroachDB
in Beta

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.