Pentaho Data Integration Beginner's Guide(Second Edition)
上QQ阅读APP看书,第一时间看更新

Time for action – avoiding errors while converting the estimated time from string to integer

In this section, you will create a variation of the first transformation in this chapter. In this case, you will allow invalid data as source, but will take care of it:

  1. Create a transformation.
  2. Add a Data Grid step and fill in the Meta tab just like you did in the first section: add a string named project_name and two dates with the format yyyy-MM-dd, named start_date and end_date. Also add a fourth field of type String, named estimated.
  3. Now also fill in the Data tab with the same values you had in that first section. For the estimated field, type the following values: 30, 180, 180, 700, 700, ---.
    Tip

    You can avoid typing all of this in again! Instead of adding and configuring the step from scratch, open the transformation you created in the first section, select the Data Grid step, copy it by pressing Ctrl + C, and paste it into the new transformation by pressing Ctrl + V. Then enter the new information, the metadata, and the data for the new field.

  4. Now add a Select values step and create a hop from the Data Grid step towards it.
  5. Double-click on the Select values step and select the Meta-data tab.
  6. Under Fieldname type or select estimated, and under Type select Integer.
  7. Close the window.
  8. Now, just like you did in the first section, add a Calculator step to calculate the field named diff_dates as the difference between the dates.
  9. In the Calculator step, also define a new field named achieved. As Calculation, select A / B. As Field A and Field B select or type diff_dates and estimated, respectively. As Value type, select Number. Finally, as Conversion mask type 0.00%.
  10. Create a hop from the Select values step to the Calculator step. When asked for the kind of hop to create, select Main output of step.
  11. Drag to the canvas a Write to log step. You will find it in the Utility category of steps.
  12. Create a new hop from the Select values step, but this time the hop has to go to the Write to log step. When asked for the kind of hop to create, select Error handling of step. Then, the following Warning window appears:
  13. Click on Copy.
    Note

    For now, you don't have to worry about these two offered options. You will learn about them in Chapter 5, Controlling the Flow of Data.

  14. Now your transformation should look like as shown in the following screenshot:
  15. Double-click on the Write to log step. In the Write to log textbox type There was an error changing the metadata of a field.
  16. Click on Get Fields. The grid will be populated with the name of the fields coming from the previous step.
  17. Close the window and save the transformation.
  18. Now run it. Look at the Logging tab in the Execution Results window. The log will look like the following screenshot:
  19. Do a preview of the Calculator step. You will see all of the lines except that line containing the invalid estimated time. In these lines, you will see the two fields calculated in this step.
  20. Do a preview on the Write to log step. You will only see the line that had the invalid estimated time.

What just happened?

In this section, you learned one way of handling errors. You created a set of data, and intentionally introduced an invalid number for the estimated field. If you defined the estimated field as an Integer, Kettle would throw an error. In order to avoid that situation, you did the following:

  1. Defined the field as a string. The String type has no limitations for the kind of text to put in it.
  2. Then you changed the metadata of this field, converting it to an integer.

If you had done only that, nothing would have changed. The error would have appeared anyway, not in the Data Grid, but in the Select values step. So, this is how you handled the error. You created an alternative stream, represented with the hop in red, where the rows with errors go. As you could see both in the preview and in the Execution Results windows, the rows with valid values continued their way towards the Calculator step, while the row whose estimated value could not be converted to Integer went to the Write to log step. In the Write to log step you wrote an informative message as well as the values for all the fields, so it is easy to identify which row was the one that caused this situation.

The error handling functionality

With the error handling functionality, you can capture errors that otherwise would cause the transformation to halt. Instead of aborting, the rows that cause the errors are sent to a different stream for further treatment.

You don't need to implement error handling in every step. In fact, you cannot do that because not all steps support error handling. The objective of error handling is to implementing it in the steps where it is more likely to have errors. In the previous section, you faced a typical situation where you should consider handling errors. Changing the metadata of fields works perfectly, just as long as you know that the data is good, but it might fail when executing against real data. Another common use of error handling is when working with JavaScript code, or with databases. You will learn more on this later on in this book.

In the previous section, you handled the error in the simplest way. There are some options that you may configure. The next section teaches you how to personalize the error handling option.