更新时间:2021-06-25 20:57:54
封面
版权信息
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Why Big Data?
What is big data?
Characteristics of big data
Volume
Velocity
Variety
Veracity
Variability
Value
Solution-based approach for data
Data – the most valuable asset
Traditional approaches to data storage
Clustered computing
High availability
Resource pooling
Easy scalability
Big data – how does it make a difference?
Big data solutions – cloud versus on-premises infrastructure
Cost
Security
Current capabilities
Scalability
Big data glossary
Big data
Batch processing
Cluster computing
Data warehouse
Data lake
Data mining
ETL
Hadoop
In-memory computing
Machine learning
MapReduce
NoSQL
Stream processing
Summary
Big Data Environment Setup
Oracle VM VirtualBox installation
Ubuntu installation
Hadoop prerequisite installation
Java installation
SSH installation and configuration
Hadoop system user
Apache Hadoop installation
Hadoop configuration
Path configuration for Hadoop commands
Hadoop server start and stop
Hadoop Ecosystem
Apache Hadoop
Hadoop Distributed File System
HDFS hands-on
Creating a directory in HDFS
Copying files from a local file system to HDFS
Copying files from HDFS to a local file system
Deleting files and folders in HDFS
Hadoop MapReduce
Job Tracker and Task Tracker
The execution flow of MapReduce
Mapper
Shuffle and Sort
Reducer
Example program
Preparing the data file for analysis
Program code
Driver program
Mapper program
Reducer program
Observations and results
YARN
Resource Manager
Node Manager
Container
Application Master
Apache Projects related to big data
Apache Zookeeper
Apache Kafka
Apache Flume
Apache Cassandra
Apache HBase
Apache Spark