Post your need
  • Free Big Data Tutorial
  • Significance of Big Data
  • Different Big Data Platforms
  • Hadoop and Big Data
  • Installation of Hadoop
  • HDFS Tutorial
  • Introduction to MapReduce
  • Working with MapReduce
  • Introduction to Sqoop
  • Introduction ot FLUME
  • Hadoop PIG Installation
  • Advanced Big Data Concepts

Introduction to Sqoop

    • Apache Sqoop is also an open source tool from the Apache Software Foundation designed to transfer multiple types of data between Hadoop and the relational database servers. It is mainly used to import the data from the relational databases like Oracle, MySQL to Hadoop HDFS (Hadoop Distributed File System), and it also used for exporting data from the Hadoop distributed file system (HDFS) to various relational databases. The traditional application management system, that is, the interaction of applications with a relational database using RDBMS is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure.

      Significance of the Apache Sqoop in Hadoop Ecosystem

      Since Big Data involves storing and analyzing of huge clusters of data by tools such as MapReduce, Hive, HBase, Cassandra, Pig, there is a need for a tool that interacts with all these tools.  of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between the relational database server and Hadoop’s HDFS.

      Sqoop: “SQL to Hadoop and Hadoop to SQL”

      Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache Software Foundation.

      How Sqoop Works?

      Sqoop Import

      The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files.

      Sqoop Export                       

      The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in the table. Those are read and parsed into a set of records and delimited with the user-specified delimiter.

      Downloading Sqoop

      We can download the latest version of Sqoop from the Apache Software Foundation website for open source resource.

      Installing Sqoop

      The following commands are used to extract the Sqoop tar ball and move it to “/usr/lib/sqoop” directory.

      $tar -xvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz

      $ su

      password:

       

      # mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha /usr/lib/sqoop

      #exit

      Configuring bashrc

      You have to set up the Sqoop environment by appending the following lines to /.bashrc file:

      #Sqoop

      export SQOOP_HOME=/usr/lib/sqoop export PATH=$PATH:$SQOOP_HOME/bin

      The following command is used to execute /.bashrc file.

      $ source /.bashrc

      Configuring Sqoop

      To configure Sqoop with Hadoop, you need to edit the sqoop-env.sh file, which is placed in the $SQOOP_HOME/conf directory. First of all, Redirect to Sqoop config directory and copy the template file using the following command:

      $ cd $SQOOP_HOME/conf

      $ mv sqoop-env-template.sh sqoop-env.sh

      Open sqoop-env.sh and edit the following lines:

      export HADOOP_COMMON_HOME=/usr/local/hadoop

      export HADOOP_MAPRED_HOME=/usr/local/hadoop

Interested about Big Data?
Get in touch with training experts Get Free Quotes
Leave a comment