Introduction to Sqoop

- Apache Sqoop is also an open source tool from the Apache Software Foundation designed to transfer multiple types of data between Hadoop and the relational database servers. It is mainly used to import the data from the relational databases like Oracle, MySQL to Hadoop HDFS (Hadoop Distributed File System), and it also used for exporting data from the Hadoop distributed file system (HDFS) to various relational databases. The traditional application management system, that is, the interaction of applications with a relational database using RDBMS is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure.
  Significance of the Apache Sqoop in Hadoop Ecosystem
  Since Big Data involves storing and analyzing of huge clusters of data by tools such as MapReduce, Hive, HBase, Cassandra, Pig, there is a need for a tool that interacts with all these tools. of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between the relational database server and Hadoop’s HDFS.
  Sqoop: “SQL to Hadoop and Hadoop to SQL”
  Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache Software Foundation.
  How Sqoop Works?
  Sqoop Import
  The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files.
  Sqoop Export
  The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in the table. Those are read and parsed into a set of records and delimited with the user-specified delimiter.
  Downloading Sqoop
  We can download the latest version of Sqoop from the Apache Software Foundation website for open source resource.
  Installing Sqoop
  The following commands are used to extract the Sqoop tar ball and move it to “/usr/lib/sqoop” directory.
  $tar -xvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
  $ su
  password:
  
  # mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha /usr/lib/sqoop
  #exit
  Configuring bashrc
  You have to set up the Sqoop environment by appending the following lines to /.bashrc file:
  #Sqoop
  export SQOOP_HOME=/usr/lib/sqoop export PATH=$PATH:$SQOOP_HOME/bin
  The following command is used to execute /.bashrc file.
  $ source /.bashrc
  Configuring Sqoop
  To configure Sqoop with Hadoop, you need to edit the sqoop-env.sh file, which is placed in the $SQOOP_HOME/conf directory. First of all, Redirect to Sqoop config directory and copy the template file using the following command:
  $ cd $SQOOP_HOME/conf
  $ mv sqoop-env-template.sh sqoop-env.sh
  Open sqoop-env.sh and edit the following lines:
  export HADOOP_COMMON_HOME=/usr/local/hadoop
  export HADOOP_MAPRED_HOME=/usr/local/hadoop

Interested about Big Data?
Get in touch with training experts Get Free Quotes

Most students read these articles

Top instructors

Recommended Courses

big data full course
Jul 23 2025

big data full course
Jul 24 2025

big data full course
Jul 25 2025

Interview Questions

Popular Tutorials

Introduction to Sqoop

Most students read these articles

Top instructors

Recommended Courses

big data full course

big data full course

big data full course