Advanced Machine Learning with R
上QQ阅读APP看书,第一时间看更新

What this book covers

Chapter 1, Preparing and Understanding Data, covers the loading of data and demonstrates how to obtain an understanding of its structure and dimensions, as well as how to install the necessary packages.

Chapter 2, Linear Regression,  provides you with a solid foundation before learning advanced methods such as Support Vector Machines and Gradient Boosting. No more solid foundation exists than the least squares linear regression.

Chapter 3, Logistic Regression, presents a discussion on how logistic regression and discriminant analysis is used in order to predict a categorical outcome. Multivariate adaptive regression splines have been added. This technique performs well, handles non-linearity, and is easy to explain.

Chapter 4, Advanced Feature Selection in Linear Models,  shows regularization techniques to help improve the predictive ability and interpretability as feature selection is a critical and often extremely challenging component of machine learning. It also includes techniques not only for regression but also for a classification problem.

Chapter 5, K-Nearest Neighbors and Support Vector Machines, begins the exploration of the more advanced and nonlinear techniques. The real power of machine learning will be unveiled.

Chapter 6, Tree-Based Classification, offers some of the most powerful predictive abilities of all the machine learning techniques, especially for classification problems. Single decision trees will be discussed along with the more advanced random forests and boosted trees. It also contains very popular techniques provided by the XGBOOST package.

Chapter 7, Neural Networks and Deep Learning, shows some of the most exciting machine learning methods currently used. Inspired by how the brain works, neural networks and their more recent and advanced offshoot, Deep Learning, will be put to the test. It also includes code for the H2O package, including hyperparameter search.

Chapter 8, Creating Ensembles and Multiclass Methods, has completely new content, involving the utilization of several great packages. 

Chapter 9, Cluster Analysis,  covers unsupervised learning. Instead of trying to make a prediction, the goal will focus on uncovering the latent structure of observations. Three clustering methods will be discussed: hierarchical, k-means, and partitioning around medoids. It also includes the methodology for executing unsupervised learning with random forests.

Chapter 10Principal Component Analysis, continues the examination of unsupervised learning with principal components analysis, which is used to uncover the latent structure of the features. Once this is done, the new features will be used in a supervised learning exercise.

Chapter 11, Association Analysis, explains association analysis and applies not only to making recommendations, product placement, and promotional pricing, but can also be used in manufacturing, web usage, and healthcare.

Chapter 12, Time Series and Causality,  discusses univariate forecast models, bivariate regression, and Granger causality models, including an analysis of carbon emissions and climate change, along with a demonstration of different causality test methods.

Chapter 13, Text Mining, demonstrates a framework for quantitative text mining and the building of topic models. Along with time series, the world of data contains vast volumes of data in a textual format. With so much data as text, it is critically important to understand how to manipulate, code, and analyze the data in order to provide meaningful insights.

Chapter 14, Exploring the Machine Learning Landscapewill briefly review the various ML concepts that a practitioner must know. In this chapter, we will cover topics such as supervised learning, reinforcement learning, unsupervised learning, and real-world ML uses cases.

Chapter 15, Predicting Employee Attrition Using Ensemble Models, covers the creation of powerful ML models through ensemble learning.  We will introduce the problem at hand and then attempt to explore the dataset with exploratory data analysis (EDA). Then in the preprocessing phase, we will create new features using prior domain experience. Once the dataset is fully prepared, models will be created using multiple ensemble techniques, such as bagging, boosting, stacking, and randomization. Lastly, we will deploy the finally selected model for production. 

Chapter 16, Implementing a Joke Recommendation Engine, introduces recommendation engines. We start by understanding the concepts and types of collaborative filtering algorithms. We will then build a recommendation engine to provide personalized joke recommendations using collaborative filtering approaches such as user-based collaborative filters and item-based collaborative filters.  Apart from this, we will be exploring various libraries available in R that can be used to build recommendation systems.

Chapter 17, Sentiment Analysis of Amazon Reviews with NLP, covers sentiment analysis, which entails finding the sentiment of a sentence and labeling it as positive, negative, or neutral and covers the various techniques that can be used to analyze text. We will understand text-mining concepts and the various ways that text is labeled based on the tone. Apart from using various popular R text-mining libraries to preprocess the reviews to be classified, we will also be leveraging a wide range of text representations, such as a bag of words, word2vec, fastText, and Glove.

Chapter 18, Customer Segmentation Using Wholesale Data, covers the segmentation, grouping, or clustering of customers, which can be achieved through unsupervised learning. In this chapter, we learn the various techniques of customer segmentation. We will be applying advanced clustering techniques, such as k-means, DIANA, and AGNES. We will explore the ML techniques for dealing with such ambiguity and have ML find out the number of groups possible based on the underlying characteristics of the input data. Evaluating the output of the clustering algorithms is an area that is often challenging to practitioners.

Chapter 19, Image Recognition Using Deep Neural Networks, covers convolutional neural networks (CNNs). We explore why CNNs work so well with computer vision problems such as object detection. We will learn about all of these concepts by applying a CNN in the building of a multi-class classification model on a popular open dataset called MNIST. We will learn about the various preprocessing techniques that can be applied to the image data in order to use the data with deep learning models.  

Chapter 20, Credit Card Fraud Detection Using Autoencoders, covers autoencoders and how they are different from the other deep learning networks, such as recurrent neural networks (RNNs)and CNNs. We will learn about autoencoders by implementing a project that identifies credit card fraud. We will become familiar with dimensionality reduction and how it can be used to identify credit card fraud detection. 

Chapter 21, Automatic Prose Generation with Recurrent Neural Networks, introduces some deep neural networks (DNNs). We will implement a neural network from scratch and will learn how to apply an RNN by doing a project. We will create an application based on long short-term memory (LSTM) network, a variant of RNNs that generates text automatically. To accomplish this task, we make use of the MXNet framework, which extends its support for the R language to perform deep learning.

Chapter 22, Winning the Casino Slot Machines with Reinforcement Learning, begins with an explanation of RL. We discuss the various concepts of RL, including strategies for solving what is called as the multi-arm bandit problem. We implement a project that uses UCB and Thompson sampling techniques in order to solve the multi-arm bandit problem.

AppendixCreating a Package, includes additional data packages.