Pentaho Data Integration Beginner's Guide(Second Edition)
上QQ阅读APP看书,第一时间看更新

Time for action – reading and writing matches files with flexibility

In this section, you will create a transformation very similar to the one you created in the previous section. In this case, however, you will interact with Spoon by telling it one-by-one which source files you want to send to the destination file:

  1. Create a new transformation.
  2. From the Input category of steps, drag to the work area a Get System Info step.
  3. Double-click the step and add a new line to the grid. Under Name type filename. As Type select command line argument 1, as shown in the following screenshot:
  4. Click on OK.
  5. Add a Calculator step and create a hop from the previous step toward this step.
  6. Double-click on the Calculator step and fill in the grid as shown in the following screenshot:
  7. Save the transformation.
  8. Select the Calculator step, and press F10 to run a preview. In the Transformation debug dialog, click on Configure.
  9. Fill in the Arguments grid by typing the name of one of your input files under the Value column. Your window will look like the following screenshot:
  10. Click on Launch. You will see a window displaying the full path of your file, for example, c:/pdi_files/input/usa_201209.txt.
  11. Close the preview window, add a Text file input step, and create a link from the Calculator step towards this step.
  12. Double-click on the Text file input step and fill the lower grid as shown in the following screenshot:
  13. Fill in the Content and Fields tabs just like you did before. It's worth saying that the Get Fields button will not populate the grid as expected, because the filename has not been provided. In order to avoid typing the fields manually you can refer to the following tip:
    Tip

    Instead of configuring the tabs again, you can open any of the transformations, copy the Text file input step and paste it here. Leave the Contents and Fields tabs untouched and just configure the File tab as explained previously.

  14. Click on OK.
  15. Add a Select values step to remove the venue field.
  16. Finally, add a Text file output step and configure it in the same way that you did in the previous section, but this time, in the Content tab select the Append checkbox.
    Tip

    Again, you can save time by copying the steps from the transformation you created before and pasting them here.

  17. Save the transformation and make sure that the matches.txt file doesn't exist.
  18. Press F9 to run the transformation.
  19. In the first cell of the Arguments grid type the name of one of the files. For example, you can type usa_201209.txt.
  20. Click on Launch.
  21. Open the matches.txt file. You should see the data belonging to the usa_201209.txt file.
  22. Run the transformation again. This time, as the name of the file type usa_201210.txt.
  23. Open the matches.txt file again. This time you should see the data belonging to the usa_201209.txt file, followed by the data in the usa_201210.txt file.

What just happened?

You read a file whose name is known at runtime, and fed a destination file, by appending the contents of the input file.

The Get System Info step tells Kettle to take the first command-line argument, and assume that it is the name of the file to read. Then the Calculator step serves for building the full path of the file.

In the Text file input step, you didn't specify the name of the file, but told Kettle to take as the name of the file, the field coming from the previous step, that is, the field built with the Calculator step.

The destination file is appended with new data every time you run the transformation.

Tip

This is an advice regarding the configuration of the Text file input step: When you don't specify the name and location of a file (like in the previous example), or when the real file is not available at design time, you will not be able to use the Get Fields button, nor be able to to see if the step is well configured. The trick is to configure the step by using a real file identical to the expected one. After the step is configured, change the name and location of the file as needed.

The Get System Info step

The Get System Info step allows you to get different types of information from the system. In this exercise, you read a command-line argument. If you look at the available list, you will see a long list of options including up to ten command-line arguments, dates relative to the present date (Yesterday 00:00:00, First day of last month 23:59:59, and so on), information related to the machine where the transformation is running (JVM max memory, Total physical memory size (bytes), and so on), and more.

In this section, you used the step as the first in the flow. This causes Kettle to generate a dataset with a single row, and one column for each defined field. In this case, you created a single field, filename, but you could have defined more if needed.

There is also the possibility of adding a Get System Info step in the middle of the flow. Suppose that after the Select values step you add a Get System Info step with the system date. That is, you define the step as shown in the following screenshot:

This will cause Kettle to add a new field with the same value, in this case the system date, for all rows, as you can see in the following screenshot:

Running transformations from a terminal window

In the previous exercise, you specified that the name of the input file will be taken from the first command-line argument. That means that when executing the transformation, the filename has to be supplied as an argument. Until now, you only ran transformations from inside Spoon. In the last exercise, you provided the argument by typing it in a dialog window. Now it is time to learn how to run transformations with or without arguments, from a terminal window.