更新时间:2021-07-02 14:49:09
coverpage
Title Page
Copyright and Credits
Machine Learning with Apache Spark Quick Start Guide
Dedication
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
The Big Data Ecosystem
A brief history of data
Vertical scaling
Master/slave architecture
Sharding
Data processing and analysis
Data becomes big
Big data ecosystem
Horizontal scaling
Distributed systems
Distributed data stores
Distributed filesystems
Distributed databases
NoSQL databases
Document databases
Columnar databases
Key-value databases
Graph databases
CAP theorem
Distributed search engines
Distributed processing
MapReduce
Apache Spark
RDDs DataFrames and datasets
RDDs
DataFrames
Datasets
Jobs stages and tasks
Job
Stage
Tasks
Distributed messaging
Distributed streaming
Distributed ledgers
Artificial intelligence and machine learning
Cloud computing platforms
Data insights platform
Reference logical architecture
Data sources layer
Ingestion layer
Persistent data storage layer
Data processing layer
Serving data storage layer
Data intelligence layer
Unified access layer
Data insights and reporting layer
Platform governance management and administration
Open source implementation
Summary
Setting Up a Local Development Environment
CentOS Linux 7 virtual machine
Java SE Development Kit 8
Scala 2.11
Anaconda 5 with Python 3
Basic conda commands
Additional Python packages
Jupyter Notebook
Starting Jupyter Notebook
Troubleshooting Jupyter Notebook
Apache Spark 2.3
Spark binaries
Local working directories
Spark configuration
Spark properties
Environmental variables
Standalone master server
Spark worker node
PySpark and Jupyter Notebook
Apache Kafka 2.0
Kafka binaries
Kafka configuration
Start the Kafka server
Testing Kafka
Artificial Intelligence and Machine Learning
Artificial intelligence
Machine learning
Supervised learning
Unsupervised learning