Export MongoDB Documents As CSV, HTML, and JSON files In Python Using Pandas
Introduction
One beautiful benefit of having MongoDB with Python’s Pandas library is that it gives you the ability to export MongoDB documents in different formats. This includes being able to export to MongoDB CSV, export MongoDB JSON, and export MongoDB HTML.
When you manage MongoDB documents PyMongo, exporting MongoDB documents Python is a task that you’ll like to accomplish on a regular basis. This tutorial explains how to export MongoDB documents as CSV, HTML, and JSON files in Python using Pandas. Begin to learn an effortless way to organize, analyze data in the file formats you choose.
If you want to bypass the tutorial because you know the basics of how to export MongoDB document Python, go to Just the Code.
Pandas objects writer methods
Methods |
---|
to_excel() |
to_feather() |
to_gbq() |
to_hdf() |
to_html() |
to_msgpack() |
to_parquet() |
to_pickle() |
to_sql() |
to_stata() |
to_json() |
to_clipboard() |
Prerequisites
MongoDB – Install and run the server. Alternatively, access the Mongo Shell from a terminal window and type
mongo
.Python 3 – You can use the PIP package manager to upgrade to the latest version.
>NOTE: The examples in this tutorial are based on Python 3, not earlier versions.
1 | pip3 install --upgrade pip |
- Create a collection of documents to try out the examples in this lesson. You’ll want to experiment and make some API calls to return a few documents.
Install Pandas Python packages
- From a terminal window, use
pip3
and install all Panda Python packages required which are the libraries and dependencies. You can also usepip3
from a command prompt window instead of a terminal window.
Install the MongoDB PyMongo driver
- Gain access to your database and MongoDB collection by installing the Python driver Pymongo library
1 | pip3 install pymongo |
Get the Pandas library using pip3
- The examples in this tutorial use the Pandas library. Use
pip3
to install it.
1 | pip3 install pandas |
Import additional libraries including Pandas and the PyMongo driver
Import the Python library for
json
because you might need to export files in that format.Import any other libraries for the exporting formats you want to use.
Import the methods for Pandas library
- You can use the Pandas library alias
pd
to import the library. We’re using the completepandas
name here in this tutorial for the sake of better visual comprehension.
1 | import pandas |
Import the class library MongoClient
- Since you’ll be creating instances of MongoDB collections and databases, import the
MongoClient
class.
1 | from pymongo import MongoClient |
Make a new MongoClient
instance
Construct a new client instance of the
MongoClient
Next, connect to the server MongoDB where you put the document collection.
1 2 | # build a new client instance of MongoClient mongo_client = MongoClient('localhost', 27017) |
- Construct a database and collection objects instance. Verify that the collection includes documents.
1 2 3 | # create new database and collection objects db = mongo_client.some_database col = db.some_collection |
Retrieve documents with an API call
- To return every document in your specified MongoDB collection, use the
find()
method to make the API call.
>NOTE: Skip putting any parameters in the call if you want all the documents. The results will be in a pymongo.cursor.Cursor
object.
1 2 | # make an API call to the MongoDB server using a Collection object cursor = col.find() |
How to know the number of documents with the method find()
- There are two main ways to get the number of MongoDB documents. You can use: (1) the
count_documents()
method for the total documents in a collection or (2) thelen()
function, which is a Python built-in, to get the number of documents returned after you make an API call.
Pass an empty dictionary ({}
) with the count_documents()
method to get a count of all the collection’s documents
1 2 | # print the total number of documents in a MongoDB collection print ("total docs in collection:", col.count_documents( {} )) |
Get a count of the documents returned with the len()
Python function
1 2 | # print the total number of documents returned in a MongoDB collection print ("total docs returned by find():", len( list(cursor) )) |
Get the MongoDB documents with the Python list()
function
- After you make the API call
find()
, and receive the PyMongo Cursor object, pass it to thelist()
function to access all documents.
1 | mongo_docs = list(cursor) |
Limit the export of MongoDB documents in the beginning
- Because large collections of at least 100 documents take a while to iterate, start out exporting just a few, maybe up to 50. That way, as you test out the samples in this tutorial, you won’t have to wait too long to see the results.
1 2 | # get only the first 50 docs in list mongo_docs = mongo_docs[:50] |
Make a find()
API call to receive a MongoDB collection list and then do iteration
Use the pandas.Series()
method to pass the dict
object
- The next step is to do a
pandas.core.series.Series
conversion from the MongoDB documents. * >NOTE:Series
objects are one-dimensional with indexing support. This compliments MongoDB document indexing requirements.
1 2 | series_obj = pandas.Series({"a key":"a value"}) print ("series_obj:", type(series_obj)) |
NumPy’s ndarray
is somewhat like the object that is returned from the pandas.Series()
method.
Get a Pandas Series
object index
and alter it
You can alter the
Series
object by passing it and making sure both it and theSeries
object index’s element number match.The example below shows both
Series
object and index with an element number of “one.”
1 2 3 | series_obj = pandas.Series( {"one":"index"} ) series_obj.index = [ "one" ] print ("index:", series_obj.index) |
Store documents in a Dataframe
object
An excellent multi-dimensional storage container describes the Pandas
DataFrame
object. It can hold NumPy arrays, otherSeries
andDataFrame
objects, and dictionaries.Make a
DataFrame
object that is empty. Thecolumns
list must be empty too.
1 2 | # create an empty DataFrame obj for storing Series objects docs = pandas.DataFrame(columns=[]) |
>NOTE: For your convenience, the append()
method is a built-in part of the DataFrame
class. It’s easy to add new object arrays to the class.
Time for iteration through the function enumerate()
and then new Pandas Series
objects creation
- The
enumerate()
function in Python is a fast way to accomplish MongoDB iteration.
1 2 | # iterate over the list of MongoDB dict documents for num, doc in enumerate( mongo_docs ): |
Convert to str()
from ObjectId()
- Make a Python string of the documents’ IDs. That’s all you need for now. Save the ID though so you’ll have access to it later on.
1 2 3 4 5 | # convert ObjectId() to str doc["_id"] = str(doc["_id"]) # get document _id from dict doc_id = doc["_id"] |
After constructing a Series
object, append it
From the MongoDB, construct a
Series
object and name the same as the string of thedoc_id
.Next, use the
append()
method to add it to theDataFrame
array.
1 2 3 4 5 | # create a Series obj from the MongoDB dict series_obj = pandas.Series( doc, name=doc_id ) # append the MongoDB Series obj to the DataFrame obj docs = docs.append( series_obj ) |
Now you have Pandas Series
objects that were converted from all documents. That is the result you’ll have when the iteration is finished.
Utilize Pandas integral methods to export diverse file formats
- The inherent methods of Pandas
Series
andDataFrame
objects allow streamlined exporting of different file formats includingto_html()
to_json()
, andto_csv()
.
Decide how you want to pass the call method
To save the documents’ data to a specific directory, identify the path by adding an argument.
If you don’t pass an argument, it will return the documents’ data as a string that is formatted.
MongoDB document data returned as csv
, json
, or html
strings.
- The examples below show how to use the
export
function to export MongoDB documents json and export MongoDB CSV.
1 2 3 4 5 6 7 | # have Pandas return a JSON string of the documents json_export = docs.to_json() # return JSON data print ("\nJSON data:", json_export) # export MongoDB documents to CSV csv_export = docs.to_csv(sep=",") # CSV delimited by commas print ("\nCSV data:", csv_export) |
- When you want to export MongoDB HTML, you have an extra option. You can use Pandas’ method
to_html
to make anio
string. This will export the data to a MongoDB documents HTML table like the example below.
1 2 3 4 5 6 7 8 9 10 11 | # create IO HTML string import io html_str = io.StringIO() # export as HTML docs.to_html( buf=html_str, classes='table table-striped' ) # print out the HTML table print (html_str.getvalue()) |
Export as a CSV, JSON, or HTML from the data of the MongoDB document
- After you conduct an export MongoDB documents Python, the conversion format is automatic for whichever file type you choose based on the call method.
1 2 3 4 5 6 7 8 | # export the MongoDB documents as a JSON file docs.to_json("object_rocket.json") # export MongoDB documents to a CSV file docs.to_csv("object_rocket.csv", ",") # CSV delimited by commas # save the MongoDB documents as an HTML table docs.to_html("object_rocket.html") |
>NOTE: Remember to state the file path to place the file your chosen directory. Otherwise, it stays in the same directory as your script.
Conclusion
This tutorial explained MongoDB documents PyMongo. It went over specifically how to convert MongoDB documents by exporting in a variety of commonly used formats. At some point in your daily MongoDB document management responsibilities, you’ll need to export MongoDB documents CSV, export MongoDB JSON, or export MongoDB documents HTML table or as a file.
Among other things about how to export MongoDB Python, you discovered how the Pandas two-dimensional DataFrame
container-like storage structure holds your Series
objects. Mastering the techniques of viewing MongoDB documents Python in the formats you require saves you time when you need to analyze, organize and manipulate your document data. You’re now ready to confidently export MongoDB documents Python.
View the data of the MongoDB document by opening it the same directory as the Python script
Just the Code
Here’s the entire script for how to export MongoDB Python documents as MongoDB documents CSV, MongoDB documents JSON, and MongoDB HTML files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | #!/usr/bin/env python3 #-*- coding: utf-8 -*- # import the MongoClient class from pymongo import MongoClient # import the Pandas library import pandas # these libraries are optional import json import time # build a new client instance of MongoClient mongo_client = MongoClient('localhost', 27017) # create new database and collection instance db = mongo_client.some_database col = db.some_collection # start time of script start_time = time.time() # make an API call to the MongoDB server cursor = col.find() # extract the list of documents from cursor obj mongo_docs = list(cursor) # restrict the number of docs to export mongo_docs = mongo_docs[:50] # slice the list print ("total docs:", len(mongo_docs)) # create an empty DataFrame for storing documents docs = pandas.DataFrame(columns=[]) # iterate over the list of MongoDB dict documents for num, doc in enumerate(mongo_docs): # convert ObjectId() to str doc["_id"] = str(doc["_id"]) # get document _id from dict doc_id = doc["_id"] # create a Series obj from the MongoDB dict series_obj = pandas.Series( doc, name=doc_id ) # append the MongoDB Series obj to the DataFrame obj docs = docs.append(series_obj) # only print every 10th document if num % 10 == 0: print (type(doc)) print (type(doc["_id"])) print (num, "--", doc, "\n") " EXPORT THE MONGODB DOCUMENTS TO DIFFERENT FILE FORMATS " print ("\nexporting Pandas objects to different file types.") print ("DataFrame len:", len(docs)) # export the MongoDB documents as a JSON file docs.to_json("object_rocket.json") # have Pandas return a JSON string of the documents json_export = docs.to_json() # return JSON data print ("\nJSON data:", json_export) # export MongoDB documents to a CSV file docs.to_csv("object_rocket.csv", ",") # CSV delimited by commas # export MongoDB documents to CSV csv_export = docs.to_csv(sep=",") # CSV delimited by commas print ("\nCSV data:", csv_export) # create IO HTML string import io html_str = io.StringIO() # export as HTML docs.to_html( buf=html_str, classes='table table-striped' ) # print out the HTML table print (html_str.getvalue()) # save the MongoDB documents as an HTML table docs.to_html("object_rocket.html") print ("\n\ntime elapsed:", time.time()-start_time) |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started