Why Apache Sqoop
In the data acquisition layer, we have chosen Apache Sqoop as the main technology. There are multiple options that can be used in this layer. Also, in place of one technology, there are other options that can be swapped. These options will be discussed in detail to some extent in the last section of this chapter.
Apache Sqoop is one of the main technologies being used to transfer data to and from structured data stores such as RDBMS, traditional data warehouses, and NoSQL data stores to Hadoop. Apache Hadoop finds it very hard to talk to these traditional stores and Sqoop helps to do that integration very easily.
Sqoop helps in the bulk transfer of data from these stores in a very good manner and, because of this reason, Sqoop was chosen as a technology in this layer.
Sqoop also helps to integrate easily with Hadoop based systems such as Apache Oozie, Apache HBase, and Apache Hive.
HBase is an open source, non-relational distributed database modeled after Google's BigTable and is written in Java.
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.