Interview Questions
Introduction ot FLUME
-
Apache Flume is also a popular tool used for the Big Data technology across the leading organizations around the globe. It’s simple and standardized functionality makes it a robust, flexible, and extensible tool for various data ingestion from web servers into the Hadoop environment. Apache Flume is simply a distributed, reliable and powerful tool for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows, and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.
YARN coordinates data ingest from Apache Flume and other services that deliver raw data into an Enterprise Hadoop cluster.
Enterprises use Flume’s powerful streaming capabilities to land data from high-throughput streams in the Hadoop Distributed File System (HDFS). Typical sources of these streams are application logs, sensor and machine data, geo-location data and social media. These different types of data can be landed in Hadoop for future analysis using interactive queries in Apache Hive. Or they can feed business dashboards served ongoing data by Apache HBase.
In one specific example, Flume is used to log manufacturing operations. When one run of product comes off the line, it generates a log file about that run. Even if this occurs hundreds or thousands of times per day, the large volume log file data can stream through Flume into a tool for same-day analysis with Apache Storm or months or years of production runs can be stored in HDFS and analyzed by a quality assurance engineer using Apache Hive.
How to install Apache Flume into the Hadoop system?
- Before installation, the latest version of Apache Flume must be downloaded from the Apache Software Foundation at https://flume.apache.org/. The Once you logged into the website. Click on the download link on the left-hand side of the homepage where it further takes to the download page with free download links.
- Create a directory with the name Flume in the same directory where the installation directories of Hadoop, HBase, and other software were installed (if you have already installed any) as shown below.
$ mkdir Flume
Step 4
Extract the downloaded tar files as shown below.
$ cd Downloads/
$ tar zxvf apache-flume-1.6.0-bin.tar.gz
$ tar zxvf apache-flume-1.6.0-src.tar.gz
- Move the content of apache-flume-1.6.0-bin.tar file to the Flume directory created earlier as shown below. (Assume we have created the Flume directory in the local user named Hadoop.)
$ mv apache-flume-1.6.0-bin.tar/* /home/Hadoop/Flume/
Configuring Flume
To configure Flume, we have to modify three files namely, flume-env.sh, flumeconf.properties, and bash.rc.
Setting the Path / Classpath
In the .bashrc file, set the home folder, the path, and the classpath for Flume as shown below.
conf Folder
If you open the conf folder of Apache Flume, you will have the following four files −
- flume-conf.properties.template,
- flume-env.sh.template,
- flume-env.ps1.template, and
- log4j.properties.
Now rename
- flume-conf.properties.template file as flume-conf.properties and
- flume-env.sh.template as flume-env.sh
flume-env.sh
Open flume-env.sh file and set the JAVA_Home to the folder where Java was installed in your system.
Verifying the Installation
Verify the installation of Apache Flume by browsing through the bin folder and typing the following command.
$ ./flume-ng
If you have successfully installed Flume, you will get a help prompt of Flume as shown below.
Get in touch with training experts Get Free Quotes