Data acquisition of batch data - technology mapping
To cover our use case and to build data lakes, we use two different technologies in this layer, namely Apache Sqoop and Apache Flume. This chapter dives deep into Sqoop and Chapter 7, Messaging Layer with Apache Kafka dives deep into Flume.
The following figure brings in the technology aspect to the conceptual architecture that we will be following throughout this book. We will keep explaining each technology and its relevance in the overall architecture before we bring all the technologies together in the final part of this book (Part 3).
Figure 02: Technology mapping for acquisition layer
In line with our use case, we will be connecting to some of the business applications data store based on a traditional RDBMS. We will be using PostgreSQL as our RDBMS database holding customer data. We will connect to an intranet (B2B) application and an Internet (B2C) application which holds different sets of customer profile information within itself. Our data lake will have a consolidation of profile information from these disparate business applications, from which we will derive SCV.
Business to Consumer ( B2C) applications are applications used by organizations to interact with their consumers.