Hadoop Common Lessons and Sessions
Hadoop Common is a set of Common library functions and utilities that support Hadoop modules. It is also called as Hadoop Core. Hadoop Common is an integral part of Apache Framework.
Talk to our Hadoop Experts for course fee and flexible weekday & weekend classes
Date/Time: Monday, December 19th, 2016, 6:00pm
Cost:
December 21st, 2016 - Jan 31, 2017 - $150
Feb 2, 2017 - March 28, 2017 - $160
April 1, 2017 - May 29, 2017 - $150
June 2, 2017 - July 3, 2017 - $160
Hadoop Common Course Topics
CLI Mini Cluster
Native Libraries
Proxy User
Rack Awareness
Secure Mode
Service Level
Authorization
HTTP Authentication
Hadoop KMS
Tracing
Lesson 1: CLI Mini Cluster
CLI mini Cluster has the ability to start Hadoop components like
YARN
MapReduce
&
HDFS
Session 1: Function of CLI Mini Cluster
CLI Mini Cluster is used to start/stop a single node Hadoop Cluster with a Command. CLI Mini Cluster is used in testing Non-Java Programs relying on Hadoop. Further working procedure and Architecture is also taught.
Session 2: Hadoop Tarball
A Tarball release of Hadoop is evolved for convenience purpose. Complete Hadoop Tarball functions is discussed to make beginners and professionals of other discipline understand the concept.
Session 3: Running the Mini Cluster
With a Hadoop Command, CLI Mini Cluster can be started from root directory of extracted Tarball. Real Time running and execution is displayed so that a professional could understand the concepts better.
Lesson 2: Native Libraries
Native Library and its utilities are important functionalities to run Hadoop Codes. Course knowledge of Library functions and its utilities are discussed in this part. Detailed concepts include
Overview of Native Libraries
Usage
Components
Supported Platforms
Download
Build
Runtime
Check
Native Shared Libraries
Session 1: Overview of Native Libraries
Complete Overview of description, execution and working of Library function and its utilities are provided.
Session 2: Usage
Use Constraints of Native Library functions and scripts are explained. Step by Step methods are explained below
Reviewing Components
Review supported Platforms
Name of Library remains same even after downloading Hadoop Release. It includes a pre-built version of Native Hadoop Library.
Install Compression Codec
Checking Runtime Logs
Session 3: Components
Native Hadoop Library consists of some of the components such as
Compression Codecs
Native IO Utilities
CRC32 Checksum implementation
Session 4: Supported Platforms
*nix platforms supports the Native Hadoop Library.
Supported platforms where the Native Hadoop Library does not work include Mac OS and Cygwin. Support information about working platforms and technical information is provided in this part of the lesson.
Session 5: Download
Information about the Hadoop release and the hardware requirements are provided. For ex: 32 bit i386- Linux native Hadoop library is available as part of Hadoop distribution. Refer - Hadoop Common Release
Session 6: Build
The Native Hadoop Library is written in ANSI C, built using GNU auto tools-chain. Developer information and release versions are updated.
Session 7:Runtime
Runtime process, log checks and Hadoop, Bin Scripts are explained in this part of the lesson.
Session 8: Check
All information about Native Library Checker is explained in this part of the lesson. Launching Native Library Checker is taught in a detailed way.
Session 9: Native Shared Libraries
A Native Shared Libraries can be loaded using Distributed Cache. It is loaded for subsequent symlinking and distributing purposes. A detailed study of library files and working principles inside this structure is explained.
Lesson 3:Proxy User
Introduction to Proxy User
Use Case
Session 1: Introduction
Proxy User is about how Superuser submits a job/ how a superuser access HDFS for another user.
Session 2: Use Case
Use cases for different scenarios differ from each prospective. A Superuser with a username can submit a job on behalf of a different user. Vast scenarios and applications involving usage of Use Cases and different users working on different jobs are explained.
Lesson 4: Rack Awareness
Hadoop Comments are always Rack awareness enabled. Hadoop Distributed File System uses rack awareness feature for fault tolerance. In this section you will learn about Mapping, topology and other concepts involving real time examples.
Lesson 5: Secure Mode
Introduction
Authentication
Data Confidentiality
Configuration
Session 1: Introduction
This part of the lesson educates about how authenticated configuration is applied for Hadoop in secure mode.
Session 2: Authentication
Authentication of different accounts and data management by HDFS is explained in this section. A few example forms
End User Accounts
User Accounts for Hadoop Daemons
Kerberos principals
Mapping
Proxy User Settings
Secure DataNode
Session 3: Data Confidentiality
Encryption and Data transfer is explained for the understanding purpose. Block Data Transfer and Encryption on HTTP forms the base this part of the lesson.
Session 4: Configuration
Configuring and granting permission for a user/ file system is explained. In this part some of the key notes to remember include
Permissions for HDFS & Local File System
Common Configurations
NameNode
Secondary NameNode
DataNode
WebHDFS
Resource Manager
NodeManager
Configuration for WebAppProxy
Linux Container Executor
MapReduce JobHistory Server
Lesson 6: Service Level and Authorization
Session 1: Purpose
It is essential to have a considerable knowledge about configuration and also to manage Service Level Authorization.
Prerequisites
It is essential to make sure Hadoop is installed and configured.
For Reference – Single Node Setup for first time users
Cluster setup for Large, Distributed clusters
Session 2: Overview
Service Level Authorization is a necessary initial authoritative mechanism. This authorization ensures clients connecting to a necessary Hadoop service have required configuration and permissions.
For Ex- A MapReduce cluster may use Service Level Authorization mechanism to allow Configured list of user/group.
Session 3: Configuration
The complete configuration process involving Service Level Authorization through Configuration file is explained.
Under this topic, you will be educated about
Enabling of Service Level Authorization
Hadoop Services
Hadoop Configuration Propertie
Access Control Lists
Refreshing Service Level Authorization Configuration
Examples
For Understanding purpose, sample Clusters and real time process is explained.
Lesson 7: HTTP Authentication
Introduction
Configuration
CORS
Session 1: Introduction
The process of configuration of Hadoop HTTP web-consoles is explained in this lesson.
Custom authentication mechanism is also explained.
Session 2: Configuration
All configuration related nodes along with the class name extensions are explained. The usage and corresponding HTTP web-consoles are also discussed.
Session 3: CORS
Certain parameters are set to avail or enable Cross Origin Support or CORS. Detailed explanation and working principles are explained.
Lesson 8: Hadoop KMS
Hadoop KMS
Hadoop KMS Client Configuration
Session 1: Hadoop KMS
Hadoop KMS is Hadoop Key management Server. It is based on Hadoop’s Key Provider API
Hadoop KMS offers Client server components. It communicates over HTTP using a REST API.
KMS is a Java based web application. It runs on pre-configured Tomcat.
Session 2: Hadoop KMS Client Configuration
Hadoop KMS client KeyProvider uses KMS Scheme.
This session also involves
KMS ConfigurationKMS Cache
KMS Aggregated Audit Logs
Start/Stop the KMS
Embedded Tomcat Configuration
Loading Native Libraries
KMS Security Configuration
KMS over HTTPS
KMS Access Control
Key Access Control
HTTP Authentication Signature
KMS HTTP REST API
Lesson 9: Tracing
The working procedure of tracing system involves collection of information in structs. This is referred as Spans. This lesson also involves
Dapper like Tracing in Hadoop
H TraceSamplers
Span Receivers
Zipkin Span Receiver Setup
Dynamic update of tracing Configuration
Starting tracing spans by HTrace API
Sample Code for tracing