How to Shard in MongoDB
If you’re working on a production application or are headed in that direction, you’ll need to start worrying about scalability. What happens if you create a social media site that starts going viral and instead of saving 100MB per day you’re now storing 100GB per day. One tool in your MongoDB tool belt will be sharding your database. You might be unfamiliar with the term but we’ll explain it in the next section as we get into what it is and how to do it.
What is Sharding?
Sharding is essentially breaking up your data into smaller pieces. Instead of 1 node containing 1TB,instead you could split that up into 10 nodes with 100GB each. Each of these nodes is a shard. Don’t confuse a shard with a replica. In the previous scenario where we split 1 node of 1TB into 10 shards of 100GB, each shard will have different data. Do you need to keep track of which shard each document exists on? No, we leave that hard work for MongoDB to manage.
An Example of Sharding with MongoDB
Now let’s jump into an example of sharding with MongoDB.
- You should have MongoDB installed
- It’s not required but it’s recommended that you have some previous command line experience with MongoDB primarily with how to start an instance.
- You should have access to the MongoDB command line ( Execute
Create directories for Shard Data, Logs, and Config
Our first step will be to create separate directories to store the data for each shard and separate file for each of their logs. We’ll do this from our
mkdir -p data/shard1/db data/shard2/db
Now let’s execute the tree command so you can see our current directory structure:
$ tree .
│ └── db
5 directories, 0 files
Now let’s create the log directory:
When sharding there is also a config server that stores environment metadata so we need to create a directory for it’s data:
mkdir -p data/config/db
Start the shard nodes
Now let’s start each of the two shards:
mongod --shardsvr --dbpath data/shard1/db --logpath logs/shard1.log --port 27000 --fork
mongod --shardsvr --dbpath data/shard2/db --logpath logs/shard2.log --port 27001 --fork
mongod --shardsvr --replSet repSetName --dbpath data/shard1/db --logpath logs/shard1.log --port 27000 --fork
mongod --shardsvr --replSet repSetName --dbpath data/shard2/db --logpath logs/shard2.log --port 27001 --fork
mongod –shardsvr –replSet
We use the
--shardsvr flag to specify if you are starting a shard server. We use the
--dbpath to specify the directory where we intend to store the data for each shard. We use the
--logpath to specify the log file for each shard. We specify fork so that we can start multiple servers from one shell. We specify the port the server should run on with the
Start the Config Server
Now let’s start the config server:
mongod --configsvr --dbpath data/config/db --logpath logs/config.log --port 25000 --fork
We need to use the
--configsvr to start a config server for a sharded cluster.
Start the mongos process
Now we start the
mongos process which essentially acts as a router to direct requests to the correct shard for the data. It talks to the config server to do this. This process is what allows us just make queries without knowing which shard the data resides on.
mongos --configdb repSetName/localhost:25000 --logpath logs/mongos.log --fork
Here we use the
--configdb to let the process know where to find the config server which we setup in the previous step on localhost port 25000. Again we use the
--logpath to specify the log file for this process.
Add Shards to mongos
Next we need to go into the Mongo shell using the
mongo command and at the prompt we add our two shards with these commands:
You should receive an
"ok" : 1 reply letting you know that you have setup the shards correctly.
In this article we showed you how to create shards in MongoDB. We learned about the servers you need to setup and how to do it. We learned about the mongos process and how it simplifies our life so we can query data without knowing which shard the data actually lives on. We hope this was a useful introduction to sharding for you. If you need help managing your database please don’t hesitate to reach out to us at Object Rocket.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.Get Started