Hands-On Explainable AI(XAI) with Python
上QQ阅读APP看书,第一时间看更新

Facets Dive

The ability to verify the ground truth of data distributions is critical in supervised learning. Supervised ML involves training datasets with labels. These labels constitute the target values. An ML algorithm will be trained to predict them. However, some or all of the labels might be wrong. The accuracy of the predictions might not be sufficient.

With Facets Dive, we can explore a large number of data points interactively and analyze their relationships.

Building the Facets Dive display code

We first import the display and HTML modules from IPython:

# Display the Dive visualization for the training data
from IPython.core.display import display, HTML

The next step is to convert a pandas DataFrame containing the training or testing data into JSON. You can run an example that was inserted in the notebook before continuing:

# @title Python to_json example {display-mode: "form"}
from IPython.core.display import display, HTML
jsonstr = train_data.to_json(orient='records')
jsonstr

The output is a JSON string of all of the records in the pandas DataFrame containing the training data:

'[{"colored_sputum":1.0,"cough":3.5,"fever":9.4,"headache":3.0,"days":3,"france":0,"chicago":1,"class":"flu"},
{"colored_sputum":1.0,"cough":3.4,"fever":8.4,"headache":4.0,"days":2,"france":0,"chicago":1,"class":"flu"},{"colored_sputum":1.0,"cough":3.3,"fever":7.3,"headache":3.0,"days":4,"france":0,"chicago":1,"class":"flu"},
{"colored_sputum":1.0,"cough":3.4,"fever":9.5,"headache":4.0,"days":2,"france":0,"chicago":1,"class":"flu"},
...
{"colored_sputum":1.0,"cough":5.0,"fever":8.0,"headache":9.0,"days":5,"france":0,"chicago":1,"class":"bad_flu"}]'

We now define an HTML template:

HTML_TEMPLATE = """
 <script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js"></script>
 <link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html">
 <facets-pe id="elem" height="600"></facets-pe>
 <script>
 var data = {jsonstr};
 document.querySelector("#elem").data = data;
 </script>"""

The program now adds the JSON string we created to the HTML template:

html = HTML_TEMPLATE.format(jsonstr=jsonstr)

Finally, we display the HTML page we created:

display(HTML(html))

The output is the interactive interface of Facets Dive:

Figure 3.12: Displaying data with Facets Dive

We have built an interactive HTML view of our training dataset. We can now explore the interactive interface of Facets Dive with the training set loaded in the Facets Overview section of this chapter.

Defining the labels of the data points

The bins containing the data points may be enough to analyze a dataset. However, in some cases, it is interesting to analyze the data points with different types of labels.

Click on the Label By dropdown list to see the list of labels to choose from:

Figure 3.13: Defining a label to display

A list of the features of the dataset will appear. Choose the one that you would like to analyze:

Figure 3.14: A selection of labels to choose from

For a medical diagnosis, select the class of the disease, for example:

Figure 3.15: Sorting by label

The data points will be displayed by color and by class:

Figure 3.16: Displaying the color and label of the data points

Try other labels to see what patterns you can find in the data points.

We will now add colors to our data points.

Defining the color of the data points

In the previous sections, we learned how to display labels on the data points. We can also add colors. We can use one feature for the labels and another for the colors.

If we click on the Color By dropdown list, we will access the list of the features of our dataset:

Figure 3.17: Selecting a color

Let's choose days, for example.

The bottom of the graph shows the early days of a patient's condition. The top of the graph shows the evolution of the condition of a patient over the days::

Figure 3.18: Using colors to display data

In this example, over the days, the diagnosis of a patient may have gone from a cold to the flu.

Try using several combinations of labels and colors to see what you can discover.

Let's analyze the data points in more detail by defining the binning of the x axis and y axis.

Defining the binning of the x axis and y axis

You can define the binning of the x axis and y axis in a very flexible way. You can choose the features you wish to combine and see how the data points react to these combinations.

You can make many inferences on your model by observing the way certain features seem to fit together, and others remain outsiders.

For each axis, we can choose a feature from the dropdown list:

Figure 3.19: Defining the x axis binning

In this case, for the x axis, let's select fever, which is a critical feature for any medical diagnosis:

Figure 3.20: Binning the x axis

For the y axis, choose days, which is also a crucial feature for a diagnosis:

Figure 3.21: Defining the y axis binning

If the fever only lasts one day, the patient might have had a bad cold. If the patient has had a fever for several days with coughing, the diagnosis could be pneumonia or the flu, for example:

Figure 3.22: Displaying data with intuitive features

Change Color By to class to obtain a better image definition if necessary. Before moving on, try some scenarios of your own and see what you can infer from the way the data points are displayed. Before moving on, try some scenarios of your own and see what you can infer from the way the data points are displayed.

Scatter plots can also help us detect patterns. Let's see how Facets Dive displays them.

Defining the scatter plot of the x axis and the y axis

Scatter plots show the data points scattered on a plot defined by the x axis and y axis. It can be useful to visualize features through scattered data points.

The scatter plot displays the relationship between data points. You can also detect patterns that will help explain the features in a dataset.

Let's display an example. Go to the Scatter | X-Axis and Scatter | Y-Axis dropdown lists:

Figure 3.23: Scatter plot options

Choose days for the x axis and colored_sputum for the y axis:

Figure 3.24: Defining scatter plot options

You will see patterns emerge. For example, we can immediately see that a high probability of colored sputum leads to pneumonia. The pneumonia data points are scattered in a pattern over the days:

Figure 3.25: Visually detecting patterns

We can also set the binning x axis to days, the binning y axis to (none), the color to class, the scatter plot x axis to (default), and the scatter plot y axis to colored_sputum, for example. We can then analyze the patterns of the classes per day:

Figure 3.26: Modifying the views to analyze the data from different perspectives

We have covered some of the visualization options of Facets Dive to analyze our data points in real time. For example, a project manager might ask a team to shrink the size of the sample displayed or clean the data.

Visual XAI will progressively become a prerequisite for any AI project.