更新时间:2021-07-23 16:01:53
封面
版权信息
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Chapter 1. Data Understanding
Introduction>
Using an empty aggregate to evaluate sample size>
Evaluating the need to sample from the initial data>
Using CHAID stumps when interviewing an SME>
Using a single cluster K-means as an alternative to anomaly detection>
Using an @NULL multiple Derive to explore missing data>
Creating an Outlier report to give to SMEs>
Detecting potential model instability early using the Partition node and Feature Selection node>
Chapter 2. Data Preparation – Select
Using the Feature Selection node creatively to remove or decapitate perfect predictors>
Running a Statistics node on anti-join to evaluate the potential missing data>
Evaluating the use of sampling for speed>
Removing redundant variables using correlation matrices>
Selecting variables using the CHAID Modeling node>
Selecting variables using the Means node>
Selecting variables using single-antecedent Association Rules>
Chapter 3. Data Preparation – Clean
Binning scale variables to address missing data>
Using a full data model/partial data model approach to address missing data>
Imputing in-stream mean or median>
Imputing missing values randomly from uniform or normal distributions>
Using random imputation to match a variable's distribution>
Searching for similar records using a Neural Network for inexact matching>
Using neuro-fuzzy searching to find similar names>
Producing longer Soundex codes>
Chapter 4. Data Preparation – Construct
Building transformations with multiple Derive nodes>
Calculating and comparing conversion rates>
Grouping categorical values>
Transforming high skew and kurtosis variables with a multiple Derive node>
Creating flag variables for aggregation>
Using Association Rules for interaction detection/feature creation>
Creating time-aligned cohorts>
Chapter 5. Data Preparation – Integrate and Format
Speeding up merge with caching and optimization settings>
Merging a lookup table>
Shuffle-down (nonstandard aggregation)>
Cartesian product merge using key-less merge by key>
Multiplying out using Cartesian product merge user source and derive dummy>
Changing large numbers of variable names without scripting>
Parsing nonstandard dates>
Parsing and performing a conversion on a complex stream>
Sequence processing>
Chapter 6. Selecting and Building a Model
Evaluating balancing with Auto Classifier>
Building models with and without outliers>
Using Neural Network for Feature Selection>
Creating a bootstrap sample>
Creating bagged logistic regression models>
Using KNN to match similar cases>
Using Auto Classifier to tune models>
Next-Best-Offer for large datasets>
Chapter 7. Modeling – Assessment Evaluation Deployment and Monitoring
How (and why) to validate as well as test>
Using classification trees to explore the predictions of a Neural Network>
Correcting a confusion matrix for an imbalanced target variable by incorporating priors>
Using aggregate to write cluster centers to Excel for conditional formatting>
Creating a classification tree financial summary using aggregate and an Excel Export node>
Reformatting data for reporting with a Transpose node>
Changing formatting of fields in a Table node>
Combining generated filters>
Chapter 8. CLEM Scripting
Building iterative Neural Network forecasts>
Quantifying variable importance with Monte Carlo simulation>
Implementing champion/challenger model management>
Detecting outliers with the jackknife method>
Optimizing K-means cluster solutions>
Automating time series forecasts>
Automating HTML reports and graphs>
Rolling your own modeling algorithm – Weibull analysis>
Appendix A. Business Understanding
Define business objectives by Tom Khabaza>
Assessing the situation by Meta Brown>
Translating your business objective into a data mining objective by Dean Abbott>
Produce a project plan – ensuring a realistic timeline by Keith McCormick>
Index