Performing Boolean selection
Indexes give us a very powerful and efficient means of looking up values in a Series based upon their labels. But what if you want to look up entries in a Series based upon the values?
To handle this scenario pandas provides us with Boolean selection. A Boolean selection applies a logical expression to the values of the Series and returns a new series of Boolean values representing the result of that expression upon each value. This result can then be used to extract only values where True was a result.
To demonstrate Boolean selection, let's start with the following Series and apply the greater than operator to determine values greater than or equal to 3:
This results in a Series with matching index labels and the result of the expression as applied to the value of each label. The dtype of the values is bool.
This series can then be used to select values from the original series. This selection is performed by passing the Boolean results to the [] operator of the source.
The syntax can be simplified by performing the logical operation within the [] operator:
Unfortunately, multiple logical operators cannot be used in a normal Python syntax. As an example, the following causes an exception to be thrown:
There are technical reasons for why the preceding code does not work. The solution is to express the equation differently, putting parentheses around each of the logical conditions and using different operators for and/or (| and &):
It is possible to determine whether all the values in a Series match a given expression using the .all() method. The following asks if all elements in the series are greater than or equal to 0:
The .any() method returns True if any value satisfies the expressions. The following asks if any element is less than 2:
You can determine how many items satisfied the expression using the .sum() method on the resulting selection. This is because the .sum() method of a series when given a series of Boolean values will treat True as 1 and False as 0: