Pentaho Data Integration Beginner's Guide(Second Edition)
上QQ阅读APP看书,第一时间看更新

Time for action – configuring the error handling to see the description of the errors

In this section, you will adapt the previous transformation so that you can capture more detail about the errors that occur:

  1. Open the transformation from the previous section, and save it with a different name. You can do it from the main menu by navigating to File | Save as…, or from the main toolbar.
  2. Right-click on the Select values step and select Define Error handling.... The following dialog window appears:
  3. In the Error descriptions fieldname textbox, type error_desc and click on OK.
  4. Double-click on the Write to log step and, after the last row, type or select error_desc.
  5. Save the transformation.
  6. Do a preview on the Write to log step. You will see a new field named error_desc with the description of the error.
  7. Run the transformation. In the Execution Window, you will see the following code:
    Write to log.0 - ------------> Linenr 1------------------------
    Write to log.0 - There was an error changing the metadata of a field.
    Write to log.0 - 
    Write to log.0 - project_name = Project F
    Write to log.0 - start_date = 1999-12-01
    Write to log.0 - end_date = 2012-11-30
    Write to log.0 - estimated = ---
    Write to log.0 - error_desc = 
    Write to log.0 - Unexpected conversion error while converting value [estimated String] to an Integer
    Write to log.0 - 
    Write to log.0 - estimated String : couldn’t convert String to Integer
    Write to log.0 - Unparseable number: “---”
    Write to log.0 - 
    Tip

    Downloading the example code

    You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

What just happened?

You modified a transformation that captured errors, by changing the default configuration of the error handling. In this case, you added a new field containing the description of the errors. You also wrote the value of the new field to the log.

Personalizing the error handling

The Error handling setting window gives you the chance to overwrite the default values of the error handling. Basically, this window allows you to do two kinds of things: configure additional fields describing the errors, and control the number of errors to capture.

The first textboxes are meant to be filled with the names of the new fields. As an example, in the previous section you filled the textbox Error descriptions fieldname with the word error_desc. Then, you could see that the output dataset of this step had a new field named error_desc with a description of the error.

The following table shows all the available options for fields describing the errors:

As you saw, you are not forced to fill in all these textboxes. Only the fields for which you provide a name will be added to the dataset. These added fields can be used as any other field. In the previous section, for example, you wrote the field to the log just as you did with the rest of the fields in your dataset.

The second thing that you can do in this setting window is control the number of errors to capture. You do it by configuring the following settings:

  • Max nr errors allowed
  • Max % errors allowed (empty==100%)
  • Min nr of rows to read before doing % evaluation

The meaning of these settings is quite straightforward, but let's make it clear with an example. Suppose that you set Max nr errors allowed to 10, Max % errors allowed (empty==100%) to 20, and Min nr of rows to read before doing % evaluation to 100. The result will be that after 10 errors, Kettle will stop capturing errors, and will abort. The same will occur if the number of rows with errors exceeds 20 percent of the total, but this control will only be made after having processed 100 rows.

Note that by default, there is no limit in the number of errors to be captured.

Finally, you might have noticed that the window also had an option named Target step. This option gives the name of the step that will receive the rows with errors. This option was automatically set when you created the hop to handle the error, but you can also set it by hand.

Have a go hero – trying out different ways of handling errors

Modify the transformation that handles errors in the following way:

  1. In the Data Grid step, change the start and end dates so they are both strings. Then, change the metadata to date by using the Select values step.
  2. Modify the settings of the Error handling dialog window, so besides the description of the error, you have a field with the number of errors that occurred in each row.
  3. Also change the settings so Kettle can handle a maximum of five errors.
  4. Add more rows to the initial dataset, introducing both right and wrong values. Make sure you have rows with more than one error.
  5. Test your work!