Big Data Journey to MongoDB


MongoDB is being used as one of the key technology for storing data in recent time because of it's flexibility and scalability; but many of us don't know the journey behind it since Big Data concept came into picture. This is important to know so as to ensure its advantages can be utilized in a correct way. We at Witspry tried to make the explanation in the easiest possible way.


What is Big Data?

It is a data with these three characteristics:

1) Volume
2) Velocity, and
3) Variety


Big data can be subjected to three categories of data types:

1) Structured data for e.g. Relational data
2) Semi-structured data for e.g. XML which is more or less flexible, and
3) Unstructured data for e.g. Word, PDF. Text, Audio, Video etc.

Big Data technology class

Big data is categorized majorly in two technology classes - Operational & Analytical.




One Big Data implementation - Hadoop Framework

Hadoop is a framework that has implemented the Big Data concept.




One top key technology in Big Data - No SQL

Due to the flexible data structure, No SQL is one of the top data access storage framework of the Big Data implementation suite.




One implementation of No SQL Document data model - MongoDB

What is MongoDB?

MongoDB is an open-source document database that provides high performance, high availability and automatic scaling.

Document Database

A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays and arrays of documents.


Advantages of using documents:

1) Documents (i.e. objects) correspond to native data types in many programming languages.
2) Embedded documents and arrays reduce need for expensive joins.
3) Dynamic schema supports fluent polymorphism.

Key features:

High Performance

MongoDB provides high performance data persistence. In particular,
  • Support for embedded data models reduces I/O activity on database system.
  • Indexes support faster queries and can include keys from embedded documents and arrays.

High Availability

To provide high availability, MongoDB's replication facility, called replica sets, provide:
  • Automatic fail-over.
  • Data redundancy