Interview Questions
Introduction to Sqoop
-
Apache Sqoop is also an open source tool from the Apache Software Foundation designed to transfer multiple types of data between Hadoop and the relational database servers. It is mainly used to import the data from the relational databases like Oracle, MySQL to Hadoop HDFS (Hadoop Distributed File System), and it also used for exporting data from the Hadoop distributed file system (HDFS) to various relational databases. The traditional application management system, that is, the interaction of applications with a relational database using RDBMS is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure.
Significance of the Apache Sqoop in Hadoop Ecosystem
Since Big Data involves storing and analyzing of huge clusters of data by tools such as MapReduce, Hive, HBase, Cassandra, Pig, there is a need for a tool that interacts with all these tools. of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between the relational database server and Hadoop’s HDFS.
Sqoop: “SQL to Hadoop and Hadoop to SQL”
Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache Software Foundation.
How Sqoop Works?
Sqoop Import
The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files.
Sqoop Export
The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in the table. Those are read and parsed into a set of records and delimited with the user-specified delimiter.
Downloading Sqoop
We can download the latest version of Sqoop from the Apache Software Foundation website for open source resource.
Installing Sqoop
The following commands are used to extract the Sqoop tar ball and move it to “/usr/lib/sqoop” directory.
$tar -xvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
$ su
password:
# mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha /usr/lib/sqoop
#exit
Configuring bashrc
You have to set up the Sqoop environment by appending the following lines to /.bashrc file:
#Sqoop
export SQOOP_HOME=/usr/lib/sqoop export PATH=$PATH:$SQOOP_HOME/bin
The following command is used to execute /.bashrc file.
$ source /.bashrc
Configuring Sqoop
To configure Sqoop with Hadoop, you need to edit the sqoop-env.sh file, which is placed in the $SQOOP_HOME/conf directory. First of all, Redirect to Sqoop config directory and copy the template file using the following command:
$ cd $SQOOP_HOME/conf
$ mv sqoop-env-template.sh sqoop-env.sh
Open sqoop-env.sh and edit the following lines:
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop
Get in touch with training experts Get Free Quotes