Friday 18 December 2015

Introduction to NoSQL Database - Part 1


In the Database world, we have mostly been experienced to a type of database known as a Relational database such as MySQL, Oracle and MSSQL. In the Relational DBMS, data is structured in a two dimensional format that conforms to a predefined and pre designed schema. The database is organized in Tables which is further arrangement of columns and rows linked through foreign keys and constraints. Relational Database was used in almost each and every applications ranging from Transnational, Data warehouse and Analytics etc.

Interactive applications have changed drastically from last 5-10 years. Number of concurrent users, amount of unstructured and semi structured data collected and processed have increased exponentially. Social media activities, Sensors, Locations and User preference data are few examples of the ever-expanding data being captured in lot of applications. Facebook, Twitter, Google etc are examples of such applications which produce enormous amount of data every minute.

Any changes in the data structure can be a cause of concern if the database is rigid and cannot accommodate modifications easily and efficiently. Companies are looking for database which is extremely flexible and can handle even unstructured formats of data which is mostly being used by companies for analytical purposes.

Relational database management system often uses a rigid schema-based approach which is not suitable for processing unstructured formats of data types.
It was getting difficult to deal with these issues using Relational database methodology. Listed below are some key reasons to the issues: 
  • RDBMS is essentially designed to run on single machine/node.
  • RDBMS uses Rigid/Fixed schema based approach to modeling data.
  • It is not cost effective to add enterprise level hardware in RDBMS.

Motivation to resolve these issues helps organizations to consider alternative approach to Relational database. Google, Amazon, Facebook were among the first companies to discover the serious limitations of relational database technology for supporting these new application requirements.

Open source NoSQL database project was formed in order to resolve these issues. NoSQL is a class of database which does not comply to all rules that Relational database complies with and rather provides different mechanism for storage and retrieval of data.

The easiest way to think of NoSQL as a database which can handle unstructured, varying and unpredictable data that today’s applications produce like Facebook, Google etc. 

NoSQL stands for Not Only SQL to emphasize they may support SQL. Rise in NoSQL databases is abbreviated by the need to analyze large amount of data (structured/un-structured) to drive actionable intelligence as they enable rapid, ad-hoc analysis of high volume data.

Relational databases are designed keeping in mind to support ad-hoc queries that’s why we always normalized our data, but with NoSQL landscape we first decide what will be our access pattern, how we are going to use the data, what are the questions that are to be answered before designing system.

With NoSQL databases we don’t have to define schema before writing the data, rather they greatly help us by providing schema while we read.

Let’s us understand with simple example of modeling database for Employee and its skills:

If we are using Relational Databases we will normalize in order to reduce the redundancy and will create three tables Employee, Employee_Skills and Skills table as mentioned in image below: 


Modeling the same in Document Oriented NoSQL Database enables us to create a schema in which we embed an array of sub-documents for each Employee directly within the Employee document.

[
  {
    "EMP_ID": "101",
    "EMP_NAME": "John Brown",
    "SKILLS": [ "JAVA","ORACLE" ]
  },
  {
    "EMP_ID": "102",
    "EMP_NAME": "Richard Castle",
    "SKILLS": [ "TALEND" ]
  },
  {
    "EMP_ID": "103",
    "EMP_NAME": "John Brown",
    "SKILLS": [ "JAVA","TALEND","NoSQL" ]
   }

In this simple example, the relational model consists of only two tables. In reality most applications will need tens, hundreds or even thousands of tables. This approach does not reflect the way architects think about data, nor the way in which developers write applications. NoSQL DB model enables data to be represented in a much more natural and intuitive way.

In simple words you can say NoSQL is a database which does not store data in Table format and is not bound by rigid and predefined schema to store and retrieve data.

In the next article I will cover various types of NoSQL database and their examples.

Please let me know in comments if you have any queries.

1 comment: