Export MongoDB Documents As CSV, HTML, and JSON files In Python Using Pandas

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

One beautiful benefit of having MongoDB with Python’s Pandas library is that it gives you the ability to export MongoDB documents in different formats. This includes being able to export to MongoDB CSV, export MongoDB JSON, and export MongoDB HTML.

When you manage MongoDB documents PyMongo, exporting MongoDB documents Python is a task that you’ll like to accomplish on a regular basis. This tutorial explains how to export MongoDB documents as CSV, HTML, and JSON files in Python using Pandas. Begin to learn an effortless way to organize, analyze data in the file formats you choose.

If you want to bypass the tutorial because you know the basics of how to export MongoDB document Python, go to Just the Code.

Pandas objects writer methods

Methods
to_excel()
to_feather()
to_gbq()
to_hdf()
to_html()
to_msgpack()
to_parquet()
to_pickle()
to_sql()
to_stata()
to_json()
to_clipboard()

Prerequisites

  • MongoDB – Install and run the server. Alternatively, access the Mongo Shell from a terminal window and type mongo.

  • Python 3 – You can use the PIP package manager to upgrade to the latest version.

>NOTE: The examples in this tutorial are based on Python 3, not earlier versions.

1
pip3 install --upgrade pip
  • Create a collection of documents to try out the examples in this lesson. You’ll want to experiment and make some API calls to return a few documents.

Install Pandas Python packages

  • From a terminal window, usepip3 and install all Panda Python packages required which are the libraries and dependencies. You can also use pip3 from a command prompt window instead of a terminal window.

Install the MongoDB PyMongo driver

  • Gain access to your database and MongoDB collection by installing the Python driver Pymongo library
1
pip3 install pymongo

Get the Pandas library using pip3

  • The examples in this tutorial use the Pandas library. Use pip3 to install it.
1
pip3 install pandas

Import additional libraries including Pandas and the PyMongo driver

  • Import the Python library for json because you might need to export files in that format.

  • Import any other libraries for the exporting formats you want to use.

Import the methods for Pandas library

  • You can use the Pandas library alias pd to import the library. We’re using the complete pandas name here in this tutorial for the sake of better visual comprehension.
1
import pandas

Import the class library MongoClient

  • Since you’ll be creating instances of MongoDB collections and databases, import the MongoClient class.
1
from pymongo import MongoClient

Make a new MongoClient instance

  • Construct a new client instance of the MongoClient

  • Next, connect to the server MongoDB where you put the document collection.

1
2
# build a new client instance of MongoClient
mongo_client = MongoClient('localhost', 27017)
  • Construct a database and collection objects instance. Verify that the collection includes documents.
1
2
3
# create new database and collection objects
db = mongo_client.some_database
col = db.some_collection

Retrieve documents with an API call

  • To return every document in your specified MongoDB collection, use the find() method to make the API call.

>NOTE: Skip putting any parameters in the call if you want all the documents. The results will be in a pymongo.cursor.Cursor object.

1
2
# make an API call to the MongoDB server using a Collection object
cursor = col.find()

How to know the number of documents with the method find()

  • There are two main ways to get the number of MongoDB documents. You can use: (1) the count_documents() method for the total documents in a collection or (2) the len() function, which is a Python built-in, to get the number of documents returned after you make an API call.

Pass an empty dictionary ({}) with the count_documents() method to get a count of all the collection’s documents

1
2
# print the total number of documents in a MongoDB collection
print ("total docs in collection:", col.count_documents( {} ))

Get a count of the documents returned with the len() Python function

1
2
# print the total number of documents returned in a MongoDB collection
print ("total docs returned by find():", len( list(cursor) ))

Get the MongoDB documents with the Python list() function

  • After you make the API call find(), and receive the PyMongo Cursor object, pass it to the list() function to access all documents.
1
mongo_docs = list(cursor)

Limit the export of MongoDB documents in the beginning

  • Because large collections of at least 100 documents take a while to iterate, start out exporting just a few, maybe up to 50. That way, as you test out the samples in this tutorial, you won’t have to wait too long to see the results.
1
2
# get only the first 50 docs in list
mongo_docs = mongo_docs[:50]

Make a find() API call to receive a MongoDB collection list and then do iteration

Screenshot of Python IDLE with a find() call to MongoDB to get a collection's documents in a Cursor object

Use the pandas.Series() method to pass the dict object

  • The next step is to do a pandas.core.series.Series conversion from the MongoDB documents. * >NOTE: Series objects are one-dimensional with indexing support. This compliments MongoDB document indexing requirements.
1
2
series_obj = pandas.Series({"a key":"a value"})
print ("series_obj:", type(series_obj))

NumPy’s ndarray is somewhat like the object that is returned from the pandas.Series() method.

Screenshot of IDLE creating a Pandas Series object from a Python dictionary

Get a Pandas Series object index and alter it

  • You can alter the Series object by passing it and making sure both it and the Series object index’s element number match.

  • The example below shows both Series object and index with an element number of “one.”

1
2
3
series_obj = pandas.Series( {"one":"index"} )
series_obj.index = [ "one" ]
print ("index:", series_obj.index)

Screenshot of a Pandas Series index in Python's IDLE

Store documents in a Dataframe object

  • An excellent multi-dimensional storage container describes the Pandas DataFrame object. It can hold NumPy arrays, other Series and DataFrame objects, and dictionaries.

  • Make a DataFrame object that is empty. The columns list must be empty too.

1
2
# create an empty DataFrame obj for storing Series objects
docs = pandas.DataFrame(columns=[])

>NOTE: For your convenience, the append() method is a built-in part of the DataFrame class. It’s easy to add new object arrays to the class.

Time for iteration through the function enumerate() and then new Pandas Series objects creation

  • The enumerate() function in Python is a fast way to accomplish MongoDB iteration.
1
2
# iterate over the list of MongoDB dict documents
for num, doc in enumerate( mongo_docs ):

Convert to str() from ObjectId()

  • Make a Python string of the documents’ IDs. That’s all you need for now. Save the ID though so you’ll have access to it later on.
1
2
3
4
5
# convert ObjectId() to str
doc["_id"] = str(doc["_id"])

# get document _id from dict
doc_id = doc["_id"]

After constructing a Series object, append it

  • From the MongoDB, construct a Series object and name the same as the string of the doc_id.

  • Next, use the append() method to add it to the DataFrame array.

1
2
3
4
5
# create a Series obj from the MongoDB dict
series_obj = pandas.Series( doc, name=doc_id )

# append the MongoDB Series obj to the DataFrame obj
docs = docs.append( series_obj )

Now you have Pandas Series objects that were converted from all documents. That is the result you’ll have when the iteration is finished.

Utilize Pandas integral methods to export diverse file formats

  • The inherent methods of Pandas Series and DataFrame objects allow streamlined exporting of different file formats including to_html() to_json(), and to_csv().

Decide how you want to pass the call method

  • To save the documents’ data to a specific directory, identify the path by adding an argument.

  • If you don’t pass an argument, it will return the documents’ data as a string that is formatted.

MongoDB document data returned as csv, json, or html strings.

  • The examples below show how to use the export function to export MongoDB documents json and export MongoDB CSV.
1
2
3
4
5
6
7
# have Pandas return a JSON string of the documents
json_export = docs.to_json() # return JSON data
print ("\nJSON data:", json_export)

# export MongoDB documents to CSV
csv_export = docs.to_csv(sep=",") # CSV delimited by commas
print ("\nCSV data:", csv_export)
  • When you want to export MongoDB HTML, you have an extra option. You can use Pandas’ method to_html to make an io string. This will export the data to a MongoDB documents HTML table like the example below.
1
2
3
4
5
6
7
8
9
10
11
# create IO HTML string
import io
html_str = io.StringIO()

# export as HTML
docs.to_html(
buf=html_str,
classes='table table-striped'
)
# print out the HTML table
print (html_str.getvalue())

Export as a CSV, JSON, or HTML from the data of the MongoDB document

  • After you conduct an export MongoDB documents Python, the conversion format is automatic for whichever file type you choose based on the call method.
1
2
3
4
5
6
7
8
# export the MongoDB documents as a JSON file
docs.to_json("object_rocket.json")

# export MongoDB documents to a CSV file
docs.to_csv("object_rocket.csv", ",") # CSV delimited by commas

# save the MongoDB documents as an HTML table
docs.to_html("object_rocket.html")

>NOTE: Remember to state the file path to place the file your chosen directory. Otherwise, it stays in the same directory as your script.

Conclusion

This tutorial explained MongoDB documents PyMongo. It went over specifically how to convert MongoDB documents by exporting in a variety of commonly used formats. At some point in your daily MongoDB document management responsibilities, you’ll need to export MongoDB documents CSV, export MongoDB JSON, or export MongoDB documents HTML table or as a file.

Among other things about how to export MongoDB Python, you discovered how the Pandas two-dimensional DataFrame container-like storage structure holds your Series objects. Mastering the techniques of viewing MongoDB documents Python in the formats you require saves you time when you need to analyze, organize and manipulate your document data. You’re now ready to confidently export MongoDB documents Python.

View the data of the MongoDB document by opening it the same directory as the Python script

Screenshot of a MongoDB collection's documents exported as JSON and CSV files using Python and Pandas

Just the Code

Here’s the entire script for how to export MongoDB Python documents as MongoDB documents CSV, MongoDB documents JSON, and MongoDB HTML files.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the MongoClient class
from pymongo import MongoClient

# import the Pandas library
import pandas

# these libraries are optional
import json
import time

# build a new client instance of MongoClient
mongo_client = MongoClient('localhost', 27017)

# create new database and collection instance
db = mongo_client.some_database
col = db.some_collection

# start time of script
start_time = time.time()

# make an API call to the MongoDB server
cursor = col.find()

# extract the list of documents from cursor obj
mongo_docs = list(cursor)

# restrict the number of docs to export
mongo_docs = mongo_docs[:50] # slice the list
print ("total docs:", len(mongo_docs))

# create an empty DataFrame for storing documents
docs = pandas.DataFrame(columns=[])

# iterate over the list of MongoDB dict documents
for num, doc in enumerate(mongo_docs):

# convert ObjectId() to str
doc["_id"] = str(doc["_id"])

# get document _id from dict
doc_id = doc["_id"]

# create a Series obj from the MongoDB dict
series_obj = pandas.Series( doc, name=doc_id )

# append the MongoDB Series obj to the DataFrame obj
docs = docs.append(series_obj)

# only print every 10th document
if num % 10 == 0:
print (type(doc))
print (type(doc["_id"]))
print (num, "--", doc, "\n")

"
EXPORT THE MONGODB DOCUMENTS
TO DIFFERENT FILE FORMATS
"

print ("\nexporting Pandas objects to different file types.")
print ("DataFrame len:", len(docs))

# export the MongoDB documents as a JSON file
docs.to_json("object_rocket.json")

# have Pandas return a JSON string of the documents
json_export = docs.to_json() # return JSON data
print ("\nJSON data:", json_export)

# export MongoDB documents to a CSV file
docs.to_csv("object_rocket.csv", ",") # CSV delimited by commas

# export MongoDB documents to CSV
csv_export = docs.to_csv(sep=",") # CSV delimited by commas
print ("\nCSV data:", csv_export)

# create IO HTML string
import io
html_str = io.StringIO()

# export as HTML
docs.to_html(
buf=html_str,
classes='table table-striped'
)

# print out the HTML table
print (html_str.getvalue())

# save the MongoDB documents as an HTML table
docs.to_html("object_rocket.html")

print ("\n\ntime elapsed:", time.time()-start_time)

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.