Post your need

Hadoop Hive Tutorials

  • What is Hive?
  • Hadoop and Hive

What is Hive?

    • Hadoop Hive, also known as Hadoop Hive is a complete data warehouse infrastructure tool that enables the processing of structured data in Hadoop. Generally, this platform resides on top of Apache Hadoop to summarize Big Data and makes querying and analyzing easy. Hive was a simple framework developed by Facebook, in the beginning, later the Apache Software Foundation (ASF) started developing it further and declared it as an open source resource under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.

      What are the common myths or misunderstanding about Apache Hive?

      • Apache Hive is a relational database or RDBMS
      • Apache Hive is a design for Online Transaction Processing (OLTP)
      • Apache Hive is a language for real-time queries and row-level updates
      • Features of Hive
      • It stores schema in a database and processed data into HDFS.
      • It is designed for OLAP.
      • It provides SQL-type language for querying called HiveQL or HQL.
      • It is familiar, fast, scalable, and extensible.

      Although Pig can be quite a powerful and simple language to use, the downside is that it’s something new to learn and master. Some folks at Facebook developed a runtime Hadoop® support structure that allows anyone who is already fluent with SQL (which is commonplace for relational database developers) to leverage the Hadoop platform right out of the gate.

      Their creation, called Hive™, allows SQL developers to write Hive Query Language (HQL) statements that are similar to standard SQL statements; now you should be aware that HQL is limited in the commands it understands, but it is still pretty useful. HQL statements are broken down by the Hive service into MapReduce jobs and executed across a Hadoop cluster.

      Hive looks very much like traditional database code with SQL access. However, because Hive is based on Hadoop and MapReduce operations, there are several key differences. The first is that Hadoop is intended for long sequential scans, and because Hive is based on Hadoop, you can expect queries to have a very high latency (many minutes). This means that Hive would not be appropriate for applications that need very fast response times, as you would expect with a database such as DB2. Finally, Hive is read-based and therefore not appropriate for transaction processing that typically involves a high percentage of write operations.

      If you're interested in SQL on Hadoop, in addition to Hive, IBM offers Big SQL which makes accessing Hive datasets faster and more secure. As with any database management system (DBMS), you can run your Hive queries in many ways. You can run them from a command line interface (known as the Hive shell), from a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) application leveraging the Hive JDBC/ODBC drivers.

Interested about Hadoop Hive?
Get in touch with training experts Get Free Quotes
Leave a comment