Grouping data_Python：Data Analytics and Visualization-QQ阅读男生武侠网

上QQ阅读APP看书，第一时间看更新

Grouping data

One typical workflow during data exploration looks as follows:

You find a criterion that you want to use to group your data. Maybe you have GDP data for every country along with the continent and you would like to ask questions about the continents. These questions usually lead to some function applications- you might want to compute the mean GDP per continent. Finally, you want to store this data for further processing in a new data structure.

We use a simpler example here. Imagine some fictional weather data about the number of sunny hours per day and city:

>>> df
 date city value
0 2000-01-03 London 6
1 2000-01-04 London 3
2 2000-01-05 London 4
3 2000-01-03 Mexico 3
4 2000-01-04 Mexico 9
5 2000-01-05 Mexico 8
6 2000-01-03 Mumbai 12
7 2000-01-04 Mumbai 9
8 2000-01-05 Mumbai 8
9 2000-01-03 Tokyo 5
10 2000-01-04 Tokyo 5
11 2000-01-05 Tokyo 6

The groups attributes return a dictionary containing the unique groups and the corresponding values as axis labels:

>>> df.groupby("city").groups
{'London': [0, 1, 2],
'Mexico': [3, 4, 5],
'Mumbai': [6, 7, 8],
'Tokyo': [9, 10, 11]}

Although the result of a groupby is a GroupBy object, not a DataFrame, we can use the usual indexing notation to refer to columns:

>>> grouped = df.groupby(["city", "value"])
>>> grouped["value"].max()
city
London 6
Mexico 9
Mumbai 12
Tokyo 6
Name: value, dtype: int64
>>> grouped["value"].sum()
city
London 13
Mexico 20
Mumbai 29
Tokyo 16
Name: value, dtype: int64

We see that, according to our data set, Mumbai seems to be a sunny city. An alternative – and more verbose – way to achieve the above would be:
```
>>> df['value'].groupby(df['city']).sum()
city
London 13
Mexico 20
Mumbai 29
Tokyo 16
Name: value, dtype: 
int64
```

本周热推：

计算机网络 AI 3.0 AI的25种可能 ABB工业机器人编程全集啊哈C！思考快你一步