Wednesday, 7 December 2016

1. Introduction to Hadoop | Hadoop Tutorial

We are living in the information age, thus with the advent of the Digital revolution, we have a large number corporations to serve our needs, to make our lives easier and to take us to the next level of humanity. To achieve this task these corporations need a base on which they could achieve this task. This “base” is the data which they have generated over the course of time.

Data is ever generating. Each second piles of data are being generated. Let’s have a look at an example : Facebook has 1.79 billion active monthly users. These users are uploading images, videos, posting texts, comments, likes, etc. which is being done every second which accounts for all types of data - structured, semi-structured and unstructured. Now let’s say, if each user hits at least one like a day, it would account to 1.79 billion likes to handle in a day. But we all know that is surely not the case, the real figures are much more humongous than that. Likewise, did you ever pondered that the thousands of Apps on the Google play store having having millions of download with each user generating data, What happens to that data ?

This data is processed and analyzed by respective companies to add competitive advantage to their Corporation and come up with future solution for them to serve the customers better, learning from the scenarios of the present. For example, people comment their reviews about specific products on Amazon, these reviews are then processed by Amazon to understand the plight of the customer and provide better service in the future.

According to the prevailing Database management system, handling this amount of data generating at this pace is a challenge as it required high memory(RAM), scaling beyond a capacity often involved downtime and came with an upper limit. Also, it would not have been cost friendly. Further, RDBMS was unable to categorize unstructured data.

So the question arises, how to process these data sets ? The answer to this is HADOOP.

Hadoop is an open source framework that allows distributed processing of large data sets across clusters of commodity hardware.

Open Source : In contrast, to the traditional RDBMS systems which required purchasing a license, Hadoop is readily available without any cost, maintained by Apache foundation.

Distributed processing : Hadoop framework splits the data into chunks and processes them in parallel. This makes it time efficient in handling big data sets. e.g. If you have to write 10,000 pages. What would you prefer, hiring the world’s fastest writer or hiring 100 writers a day ? Definitely, the latter is much faster and cheaper.

Large Data Sets: Used to process big data sets with ease. All the processing happens on the data present in HDFS (Hadoop File System). Since the data is divided into different machines and processed in parallel, it allows to process massive data.

Commodity Hardware : Hadoop uses cheap and simple hardware rather than high enterprise computers that cost too much. “China mobile”, a Telecom Company based in China, which was generating 5-8 TB data of records daily. Hadoop enabled them to use 10 times data than their older system at ⅕ cost.

Next Article : 2. Advantages of Hadoop


  1. Creating visualization and sharing your visualization. How it is different from MS Excel or other tools. Tableau helps anyone quickly analyse, visualize and share information

  2. Very help full article on Hadoop for Beginner.
    Hadoop technology has a huge demand in IT Industry.

  3. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Talend
    MaxMunus Offer World Class Virtual Instructor led training on Talend. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Nitesh Kumar
    Skype id: nitesh_maxmunus
    Ph:(+91) 8553912023

  4. It is really a great work and the way in which you are sharing the knowledge is excellent.Thanks for your informative article

    Hadoop Online Training
    Data Science Online Training

  5. I am technology Enthusiast. Your blog is really awesome, attractive and impressive. I like the way you think. it is very useful for Java SE & Java EE Learners. Your article adds best knowledge to our Java Online Training India. or learn thru Java Online Training India Students. or learn thru JavaScript Online Training India. Appreciating the persistence you put into your blog and detailed information you provide. Kindly keep blogging.

  6. Really useful information about hadoop, i have to know information about hadoop online training institutes.

  7. Best Digital Marketing company Anantapur
    helpful information, thanks for writing and share this information

  8. Best Digital Marketing company hyderabad
    It's Really A Great Post. Looking For Some More Stuff. .

  9. Really useful information. we are providing best data science online training from industry experts.

  10. You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...

    Hadoop Training in Gurgaon,

  11. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site

  12. Really useful information. we are providing best data science online training from industry experts.