Monday, 7 February 2011

MongoDB - Does My Data Look Big In This?

You have an existing relational database containing x amount of data and you decide to migrate that data, for whatever reason, into MongoDB. You may have an pre-conceived belief that as your relational database is x GB in size, that after loading that data into MongoDB your NoSQL database size will be around about x GB too - after all, you're loading exactly the same data from one to the other right?

Wrong

For simplicity's sake, assume you are doing a straight load of data, creating 1 document in MongoDB for each row in your relational database and each column value is represented as a key-value pair in the document. What you need to take into account is that:
  1. each document you insert into MongoDB will be assigned an ObjectId (_id) if you don't supply one yourself. This is a 12-byte value consisting of 4 parts:
    • timestamp (4 bytes)
    • machine identifier (3 bytes)
    • process id (2 bytes)
    • increment (3 bytes)
  2. in MongoDB, each document will contain not only the values, but the key names (~"column names" in relational db-speak) too. So if you have keys: "FirstName" and "LastName", then those alone will add 17 bytes to each document.
  3. MongoDB automatically adds some padding to the documents to allow for documents to grow in size to try and reduce the need for documents to be moved around if they do grow

Considerations

It's worth considering using shortened key names in your documents as this space saving can quickly mount up. Also, if a column doesn't have a value in your relational database then you should also consider not including that data item in that particular document.

SQL Server vs. MongoDB

As an example, I loaded approximately 105 million rows of data from a table in SQL Server into a collection in MongoDB. The resulting data sizes were:
  • SQL Server : 14060MB
  • MongoDB : 19629MB (total storage size inc. allocated but unused space: 22570MB)
That's ~39% more space being used for the same data in MongoDB (based on data size stats, with ObjectIDs being assigned by MongoDB).

After switching to use 2 character coded key names instead of the full length names, this dropped to 15439MB (18266MB total storage size) - about 10% more space than SQL Server. Had I had columns without values in SQL Server, then ensuring those weren't included as items in MongoDB documents would have saved more space.

2 comments:

  1. Thanks Adrian. This is more or less exactly the comparison I was looking for.

    ReplyDelete
  2. I guess that indexes can come into play here and make a significant difference?

    ReplyDelete