Post your need
  • Free Big Data Tutorial
  • Significance of Big Data
  • Different Big Data Platforms
  • Hadoop and Big Data
  • Installation of Hadoop
  • HDFS Tutorial
  • Introduction to MapReduce
  • Working with MapReduce
  • Introduction to Sqoop
  • Introduction ot FLUME
  • Hadoop PIG Installation
  • Advanced Big Data Concepts

Hadoop PIG Installation

    • Apache PIG is a valuable addition to the Hadoop ecosystem for its high-level scripting language enabling the data experts to write more complex data transformations without the need to possess vast expertise on the Java programming language. Apache Pig’s simple SQL-like scripting language is called Pig Latin and appeals to developers already familiar with scripting languages and SQL. Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Pig.

      Pig is complete, so you can do all required data manipulations in Apache Hadoop with Pig. Through the User Defined Functions (UDF) facility in Pig, Pig can invoke code in many languages like JRuby, Jython, and Java. You can also embed Pig scripts in other languages. The result is that you can use Pig as a component to build larger and more complex applications that tackle real business problems.

      Pig works with data from many sources, including structured and unstructured data, and store the results into the Hadoop Data File System.

      Pig scripts are translated into a series of MapReduce jobs that are run on the Apache Hadoop cluster.

      Features and Specifications of Apache Pig

      Apache Pig comes with the following features −

      • Rich set of operators − It provides many operators to perform operations like join, sort, filter, etc.
      • Ease of programming − Pig Latin is similar to SQL and it is easy to write a Pig script if you are good at SQL.
      • Optimization opportunities − The tasks in Apache Pig optimize their execution automatically, so the programmers need to focus only on the semantics of the language.
      • Extensibility − Using the existing operators, users can develop their own functions to read, process, and write data.
      • UDF’s − Pig provides the facility to create User-defined Functions in other programming languages such as Java and invokes or embed them in Pig Scripts.
      • Handles all kinds of data − Apache Pig analyzes all kinds of data, both structured as well as unstructured. It stores the results in HDFS.
Interested about Big Data?
Get in touch with training experts Get Free Quotes
Leave a comment