Comprehensive Concepts of a Data Lake
The concept of a Data Lake in an enterprise was driven by certain challenges that enterprises were facing with the way the data was handled, processed and stored. Initially, all the individual applications in the enterprise, via a natural evolution cycle, started maintaining huge amounts of data themselves with almost no reuse in other applications in the same enterprise. These created information silos across various applications. As the next step of evolution, these individual applications started exposing this data across the organization as a data mart access layer over the central data warehouse. While Data Mart solved one part of the problem, other problems still persisted. These problems were more about data governance, data ownership and data accessibility, which were required to be resolved so as to have better availability of enterprise relevant data. This is where a need was felt to have Data Lakes which could not only make such data available but also store any form of data and process it so that data can be analyzed and kept ready for consumption by consumer applications. In this chapter, we will look at some of the critical aspects of a Data Lake and understand how it matters for an enterprise.