Interview Questions
Hadoop Architecture
-
Hadoop framework is typically comprised of four major modules. They are,
- Hadoop Common
- Hadoop YARN
- HDFS (Hadoop Distributed File System)
- Hadoop MapReduce
Every module contains essential elements and utilities to complete the functionality of Hadoop. Let us dig a little into each module in brief.
Hadoop Common
This is a base module which contains all the essential Java libraries and native resources that enable the Hadoop to run. Ranging from library files and scripts to switch on the Hadoop system to libraries supporting file system and OS level abstraction, Hadoop Common contains all the necessary libraries and utilities. Hadoop Common module is also called as Hadoop Core and it is named ‘Common’ for being a container of utilities and libraries that are common to other modules.
Hadoop YARN
This framework module in Hadoop is dedicated for the management of the cluster resource and schedule jobs. With YARN, the entire resource management and jobs scheduling functionalities are split separate for faster process. Hadoop YARN provides a global Application Master (AM) and Resource Manager (RM) for resource and job schedule management.
HDFS (Hadoop Distributed File System)
As we know, Hadoop is known for using low-cost commodity hardware to run. In order to enable the high-throughput and fault-tolerant access to the data from the commodity hardware, Hadoop uses an exclusive file system known as HDFS (Hadoop Distributed File System). The file system also provides high throughput access to the application data and capable of handling applications with large datasets.
Hadoop MapReduce
This framework is based on the Hadoop YARN module in the Hadoop Architecture. It uses the job scheduling and resource management capabilities of Hadoop YARN to enable parallel data processing of datasets.
Advantages of using Hadoop Framework
The framework is an open-source resource and compatible with almost any platforms.
- It is easy to write and test distributed systems using Hadoop framework
- Hadoop framework allows utilizing the CPU cores efficiently.
- Hadoop framework has libraries to detect and rectify faults and failures in the application layer.
- It allows servers to be added and removed to the clusters dynamically.
- Hadoop system is not interrupted or disturbed while adding and removing servers in the clusters.
Get in touch with training experts Get Free Quotes