更新时间:2021-07-23 15:47:39
封面
版权信息
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Chapter 1. Getting Started with Pentaho Data Integration
Pentaho Data Integration and Pentaho BI Suite
Exploring the Pentaho Demo
Installing PDI
Time for action – installing PDI
Launching the PDI graphical designer – Spoon
Time for action – starting and customizing Spoon
Time for action – creating a hello world transformation
Installing MySQL
Time for action – installing MySQL on Windows
Time for action – installing MySQL on Ubuntu
Summary
Chapter 2. Getting Started with Transformations
Designing and previewing transformations
Time for action – creating a simple transformation and getting familiar with the design process
Running transformations in an interactive fashion
Time for action – generating a range of dates and inspecting the data as it is being created
Handling errors
Time for action – avoiding errors while converting the estimated time from string to integer
Time for action – configuring the error handling to see the description of the errors
Chapter 3. Manipulating Real-world Data
Reading data from files
Time for action – reading results of football matches from files
Time for action – reading all your files at a time using a single text file input step
Time for action – reading all your files at a time using a single text file input step and regular expressions
Sending data to files
Time for action – sending the results of matches to a plain file
Getting system information
Time for action – reading and writing matches files with flexibility
Time for action – running the matches transformation from a terminal window
XML files
Time for action – getting data from an XML file with information about countries
Chapter 4. Filtering Searching and Performing Other Useful Operations with Data
Sorting data
Time for action – sorting information about matches with the Sort rows step
Calculations on groups of rows
Time for action – calculating football match statistics by grouping data
Filtering
Time for action – counting frequent words by filtering
Time for action – refining the counting task by filtering even more
Looking up data
Time for action – finding out which language people speak
Chapter 5. Controlling the Flow of Data
Splitting streams
Time for action – browsing new features of PDI by copying a dataset
Time for action – assigning tasks by distributing
Splitting the stream based on conditions
Time for action – assigning tasks by filtering priorities with the Filter rows step
Time for action – assigning tasks by filtering priorities with the Switch/Case step
Merging streams
Time for action – gathering progress and merging it all together
Time for action – giving priority to Bouchard by using the Append Stream
Treating invalid data by splitting and merging streams
Time for action – treating errors in the estimated time to avoid discarding rows
Chapter 6. Transforming Your Data by Coding
Doing simple tasks with the JavaScript step
Time for action – counting frequent words by coding in JavaScript
Reading and parsing unstructured files with JavaScript
Time for action – changing a list of house descriptions with JavaScript
Doing simple tasks with the Java Class step
Time for action – counting frequent words by coding in Java
Transforming the dataset with Java
Time for action – splitting the field to rows using Java
Avoiding coding by using purpose built steps
Chapter 7. Transforming the Rowset
Converting rows to columns
Time for action – enhancing the films file by converting rows to columns
Aggregating data with a Row Denormaliser step
Time for action – aggregating football matches data with the Row Denormaliser step
Normalizing data
Time for action – enhancing the matches file by normalizing the dataset
Generating a custom time dimension dataset by using Kettle variables
Time for action – creating the time dimension dataset
Time for action – parameterizing the start and end date of the time dimension dataset
Chapter 8. Working with Databases
Introducing the Steel Wheels sample database
Time for action – creating a connection to the Steel Wheels database
Time for action – exploring the sample database
Querying a database
Time for action – getting data about shipped orders
Time for action – getting orders in a range of dates using parameters
Time for action – getting orders in a range of dates by using Kettle variables
Sending data to a database
Time for action – loading a table with a list of manufacturers
Time for action – inserting new products or updating existing ones
Time for action – testing the update of existing products
Eliminating data from a database
Time for action – deleting data about discontinued items