Applied Data Visualization with R and ggplot2
上QQ阅读APP看书,第一时间看更新

Creating a Histogram Using qplot and ggplot

In this section, we want to visualize the humidity distribution for the city of Vancouver. We'll create a histogram for humidity data using qplot and ggplot.

Let's begin by implementing the following steps:

  1. Create a plot with RStudio by using the following command: qplot(df_hum$Vancouver):

  1. Use ggplot to create the same plot using the following command:
ggplot(df_hum,aes(x=Vancouver))
This command does not do anything; ggplot2 requires the name of the object that we wish to make. To make a histogram, we have to specify the geom type (in other words, a histogram). aes stands for aesthetics, or the quantities that get plotted on the x- and y- axes, and their qualities. We will work on changing the aesthetics later, in order to visualize the plot more effectively.

Notice that there are some warning messages, as follows:

'stat_bin()' using 'bins = 30'. Pick better value with 'binwidth'.
Warning message:
Removed 1826 rows containing non-finite values (stat_bin).

You can ignore these messages; ggplot automatically detects and removes null or NA values.

  1. Obtain the histogram with ggplot by using the following command:
ggplot (df_hum, aes(x=Vancouver)) + geom_histogram() 

You'll see the following output:

Here's the output code:

require("ggplot2")
require("tibble")
#Load a data file - Read the Humidity Data
df_hum <- read.csv("data/historical-hourly-weather-data/humidity.csv")
#Display the summary
str(df_hum)
qplot(df_hum$Vancouver)
ggplot(df_hum, aes(x=Vancouver)) + geom_histogram()
Refer to the complete code at https://goo.gl/tu7t4y.

In order for ggplot to work, you will need to specify the geometric object. Note that the column name should not be enclosed in strings.