
Summary
In this chapter, we looked at various aspects of Apache Hadoop. We discussed the main components of Hadoop, such as the Hadoop Distributed File System, the MapReduce framework, and YARN. In between, we did some practical work by executing basic command related to HDFS. We also developed a program to calculate a bill summary using the MapReduce framework with easy-to-understand code.
Then, we discussed other projects under the umbrella of the Apache Foundation. These projects included Apache Zookeeper, Apache Kafka, Apache Flume, Apache Cassandra, Apache HBase, and Apache Spark. These projects are related to Hadoop Ecosystem. Some of them are related to bringing data into Hadoop, while others are related to the processing of data. The important thing we learned here is that though projects may appear similar, their uses and architecture differs. It's up to a big data architect to decide which framework best fits their required setup.
In the next chapter, we will look at NoSQL in detail, as well as its core concepts and principles. We will learn what the different data models in NoSQL are, as well as what applications and frameworks are available against each data model. We will also discuss the different designs available, helping you to choose the right model for you.