Data Lake for Enterprises
上QQ阅读APP看书,第一时间看更新

Data acquisition of batch data - technology mapping

To cover our use case and to build data lakes, we use two different technologies in this layer, namely Apache Sqoop and Apache Flume. This chapter dives deep into Sqoop and Chapter 7Messaging Layer with Apache Kafka dives deep into Flume.

The following figure brings in the technology aspect to the conceptual architecture that we will be following throughout this book. We will keep explaining each technology and its relevance in the overall architecture before we bring all the technologies together in the final part of this book (Part 3).

Figure 02: Technology mapping for acquisition layer

In line with our use case, we will be connecting to some of the business applications data store based on a traditional RDBMS. We will be using PostgreSQL as our RDBMS database holding customer data. We will connect to an intranet (B2B) application and an Internet (B2C) application which holds different sets of customer profile information within itself. Our data lake will have a consolidation of profile information from these disparate business applications, from which we will derive SCV.

Business to Business ( B2B) applications are applications used by various departments within the organization and between organizations/businesses.

Business to Consumer ( B2C) applications are applications used by organizations to interact with their consumers.