The pandas Series
The pandas Series is the base data structure of pandas. A series is similar to a NumPy array, but it differs by having an index, which allows for much richer lookup of items instead of just a zero-based array index value.
The following creates a series from a Python list.:
The output consists of two columns of information. The first is the index and the second is the data in the Series. Each row of the output represents the index label (in the first column) and then the value associated with that label.
Because this Series was created without specifying an index (something we will do next), pandas automatically creates an integer index with labels starting at 0 and increasing by one for each data item.
The values of a Series object can then be accessed by using the [] operator, passing the label for the value you require. The following gets the value for the label 1:
This looks very much like normal array access in many programming languages. But as we will see, the index does not have to start at 0, nor increment by one, and can be many other data types than just an integer. This ability to associate flexible indexes in this manner is one of the great superpowers of pandas.
Multiple items can be retrieved by specifying their labels in a Python list. The following retrieves the values at labels 1 and 3:
A Series object can be created with a user-defined index by using the index parameter and specifying the index labels. The following creates a Series with the same values but with an index consisting of string values:
Data in the Series object can now be accessed by those alphanumeric index labels. The following retrieves the values at index labels 'a' and 'd':
It is still possible to refer to the elements of this Series object by their numerical 0-based position. :
We can examine the index of a Series using the .index property:
The index is itself actually a pandas object, and this output shows us the values of the index and the data type used for the index. In this case, note that the type of the data in the index (referred to as the dtype) is object and not string. We will examine how to change this later in the book.
A common usage of a Series in pandas is to represent a time series that associates date/time index labels with values. The following demonstrates this by creating a date range using the pd.date_range() pandas function:
This has created a special index in pandas called DatetimeIndex, which is a specialized type of pandas index that is optimized to index data with dates and times.
Now let's create a Series using this index. The data values represent high temperatures on specific days:
This type of series with a DateTimeIndex is referred to as a time series.
We can look up a temperature on a specific data by using the date as a string:
Two Series objects can be applied to each other with an arithmetic operation. The following code creates a second Series and calculates the difference in temperature between the two:
Since the index is not integer, we can also look up values by 0-based value:
Finally, pandas provides many descriptive statistical methods. As an example, the following returns the mean of the temperature differences: