Pentaho Data Integration Beginner's Guide(Second Edition)
上QQ阅读APP看书,第一时间看更新

Time for action – sending the results of matches to a plain file

In the previous section, you read several files with the results of football matches. Now you want to send the data coming from all files to a single output file:

  1. Open the transformation that you created in the last section and save it under a different name.
  2. Delete the Dummy (do nothing) step by selecting it and pressing Del.
  3. Expand the Output branch of the Steps tree.
  4. Look for the Text file output step and drag this icon to the work area.
  5. Create a hop from the Select values step to this new step.
  6. Double-click on the Text file output step icon and give it a name.
  7. As Filename type C:/pdi_files/output/matches.
    Note

    Note that the path contains forward slashes. If your system is Windows, you may use back or forward slashes. PDI will recognize both notations.

  8. In the Content tab, leave the default values.
  9. Select the Fields tab and configure it as shown in the following screenshot:
  10. Click on OK. Your screen will look like the following screenshot:
  11. Give the transformation a name and description, and save it.
  12. Run the transformation by pressing F9 and then click on Launch.
  13. Once the transformation is finished, look for the new file. It should have been created as C:/pdi_files/output/matches.txt and will appear as shown:
    match_date;home_team;away_team;result
    07-09-12;Iceland;Norway;2:0
    07-09-12;Russia;Northern Ireland;2:0
    07-09-12;Liechtenstein;Bosnia-Herzegovina;1:8
    07-09-12;Wales;Belgium;0:2
    07-09-12;Malta;Armenia;0:1
    07-09-12;Croatia;FYR Macedonia;1:0
    07-09-12;Andorra;Hungary;0:5
    07-09-12;Netherlands;Turkey;2:0
    07-09-12;Slovenia;Switzerland;0:2
    07-09-12;Albania;Cyprus;3:1
    07-09-12;Montenegro;Poland;2:2
    …
    Note

    If your system is Linux or similar, or if your files are in a different location, change the paths accordingly.

What just happened?

You gathered information from several files and sent all data to a single file.

Output files

We saw that PDI could take data from several types of files. The same applies to output data. The data you have in a transformation can be sent to different kinds of files. All you have to do is redirect the flow of data towards an Output step.

Output steps

There are several steps which allow you to send the data to a file. All those steps are under the Output category; Text file output and Microsoft Excel Output are some of them.

For an output step, just like you do for an input step, you also have to define:

  • Name of the step: It is mandatory and must be different for every step in the transformation.
  • Name and location of the file: These must be specified. If you specify an existing file, the file will be replaced by a new one (unless you check the Append checkbox, present in some of the output steps, for example, the Text file output step used in the last section).
  • Content type: This data includes a delimiter character, type of encoding, whether to use a header, and so on. The list depends on the kind of file chosen. If you check Header (which is selected by default), the header will be built with the names of the fields.
    Tip

    If you don't like the names of the fields as header names in your file, you may use a Select values step to rename those fields before sending them to a file.

  • Fields: Here you specify the list of fields that have to be sent to the file, and provide some format instructions. Just like in the input steps, you may use the Get Fields button to fill the grid. In this case, the grid is going to be filled based on the data that arrives from the previous step. You are not forced to send every data coming to the Output step, nor to send the fields in the same order, as you can figure out from the example in the previous section.
    Note

    If you leave the Fields tab empty, Kettle will send all the fields coming from the previous step to the file.

Have a go hero – extending your transformations by writing output files

Supposing that you read your own files in the previous section, modify your transformations by writing some or all the data back into files, but this time changing the format, headers, number, or order of fields, and so on. The objective is to get some experience, to see what happens. After some tests, you will feel confident with input and output files, and ready to move forward.

Have a go hero – generate your custom matches.txt file

Modify the transformation that generated the matches.txt file. This time your output file should look similar to this:

match_date|home_team|away_team
07-09|Iceland (2)|Norway (0)
07-09|Russia (2)|Northern Ireland (0)
07-09|Liechtenstein (1)|Bosnia-Herzegovina (8)
07-09|Wales (0)|Belgium (2)
07-09|Malta (0)|Armenia (1)
07-09|Croatia (1)|FYR Macedonia (0)
…
Tip

In order to create the new fields you can use some or all of the next steps: Split Fields, UDJE, Calculator, and Select values. Besides, you will have to customize the Output step a bit, by changing the format of the date field, changing the default separator, and so on.