data:image/s3,"s3://crabby-images/06e83/06e831b64a58804a7bba0ed2861b0048abfaa65c" alt="Machine Learning with R Cookbook(Second Edition)"
上QQ阅读APP看书,第一时间看更新
How to do it...
Perform the following steps to visualize a dataset:
- Load the iris data into the R session:
> data(iris)
- Calculate the frequency of species within the iris using the table command:
> table.iris = table(iris$Species) > table.iris Output:
setosa versicolor virginica 50 50 50
- As the frequency in the table shows, each species represents 1/3 of the iris data. We can draw a simple pie chart to represent the distribution of species within the iris:
> pie(table.iris) Output:
data:image/s3,"s3://crabby-images/6dcf1/6dcf119f15ce6d20148ee67e2e0a8bc193c0e7d4" alt=""
The pie chart of species distribution
- The histogram creates a frequency plot of sorts along the x-axis. The following example produces a histogram of the sepal length:
> hist(iris$Sepal.Length)
data:image/s3,"s3://crabby-images/b8f8e/b8f8eff5ca55f9f27a98b0a7b6f0b9e58e9246f7" alt=""
The histogram of the sepal length
- In the histogram, the x-axis presents the sepal length and the y-axis presents the count for different sepal lengths. The histogram shows that for most irises, sepal lengths range from 4 cm to 8 cm.
- Boxplots, also named box and whisker graphs, allow you to convey a lot of information in one simple plot. In such a graph, the line represents the median of the sample. The box itself shows the upper and lower quartiles. The whiskers show the range:
> boxplot(Petal.Width ~ Species, data = iris)
data:image/s3,"s3://crabby-images/568fe/568fe42fa90b43d0c36fae80773be4e47ef99d2f" alt=""
The boxplot of the petal width
- The preceding screenshot clearly shows the median and upper range of the petal width of the setosa is much shorter than versicolor and virginica. Therefore, the petal width can be used as a substantial attribute to distinguish iris species.
- A scatter plot is used when there are two variables to plot against one another. This example plots the petal length against the petal width and color dots in accordance to the species it belongs to:
> plot(x=iris$Petal.Length, y=iris$Petal.Width, col=iris$Species)
data:image/s3,"s3://crabby-images/c4933/c493389350087e8c36f717d3b373c1b6c4a4df30" alt=""
The scatter plot of the sepal length
- The preceding screenshot is a scatter plot of the petal length against the petal width. As there are four attributes within the iris dataset, it takes six operations to plot all combinations. However, R provides a function named pairs, which can generate each subplot in one figure:
> pairs(iris[1:4], main = "Edgar Anderson's Iris Data", pch = 21,
bg = c("red", "green3", "blue")[unclass(iris$Species)])
data:image/s3,"s3://crabby-images/21219/212195f4a2595298a48f110c313d7e73168b7da2" alt=""
Pairs scatterplot of iris data