上QQ阅读APP看书，第一时间看更新

Saving the original

As a rule, a proven practice when making any sort of change to your data file is to preserve the original state of the data. Not to worry, IBM Watson saves you the time and effort of backing up, saving, and then maintaining version control, since whenever you use Refine, a new and separate dataset is created automatically for you that is related to your original dataset.

Note: The changes that you make using Refine are saved as a separate version of the original dataset and are automatically available in Predict, Explore, and Assemble. If you modify the data in an exploration, the changed data is available only in that exploration.

In addition to using Refine to make your data more usable, you can also use Refine to learn more about your data. Once you are on the Refine page, you can click on the data metrics icon (it is the little bar graph on the left of the page), shown as follows:

Bike sharing refinement

When you view the data metrics for your selected dataset, you will see the following information for each column of your data:

The quality score for each column, which indicates a column's potential readiness for use in a prediction
The percentage of data that is missing
Distribution graphs of the data (in numeric columns)

Getting started with Refine is easy; once you click on Refine, a familiarly-formatted Refine data set dialog is presented (shown in the following screenshot), where you can select an existing dataset, add a new dataset, or take advantage of Watson's sample data:

After selecting (or uploading) a dataset, the data is displayed in the Refine page (shown in the following screenshot), where you can explore your data's metrics and perform the appropriate refinements based upon your requirements:

Now that we've done the high-level, quick review of most of the fundamental features of the IBM Watson interface, and before we jump into our first IBM Watson project, let's move on to the final section of this chapter, where we will walk through the steps required to add some new data to IBM Watson as well as do some exploring and refining of that data.