R Graphs Cookbook Second Edition
上QQ阅读APP看书,第一时间看更新

Setting colors of points, lines, and bars

In this recipe, you will learn the simplest way to change the colors of points, lines, and bars in scatter plots, line plots, histograms, and bar plots.

Getting ready

All you need to try out in this recipe is to run R and type the recipe in the command prompt. You can also choose to save the recipe as a script so that you can use it again later on.

How to do it...

The simplest way to change the color of any graph element is using the col argument. For example, the plot() function takes the col argument:

plot(rnorm(1000),
col="red")

If we choose the plot type as the line, then the color is applied to the plotted line. Let's use the dailysales.csv example dataset we used in Chapter 1, R Graphics. First, we need to load it:

Sales <- read.csv("dailysales.csv",header=TRUE)

plot(sales$units~as.Date(sales$date,"%d/%m/%y"),
type="l", #Specify type of plot as l for line
col="blue")

Similarly, the points() and lines() functions apply the col argument's value to the plotted points and lines, respectively.

The barplot() and hist() functions also take the col argument and apply the respective value to the plotted bars. So, the following code will produce a bar plot with blue bars:

barplot(sales$ProductA~sales$City,
col="blue")

The col argument for the boxplot() function is applied to the color of the plotted boxes.

How it works...

The col argument automatically applies the specified color to the elements being plotted, based on the plot type. So, if we do not specify a plot type or choose points, then the color is applied to points. Similarly, if we choose the plot type as the line, then the color is applied to the plotted line, and if we use the col argument in the barplot() or histogram() commands, then the color is applied to the bars.

The col argument accepts names of colors such as red, blue, and black. The colors() (or colours()) function lists all the built-in colors (more than 650) that are available in R. We can also specify colors as hexadecimal codes such as #FF0000 (for red), #0000FF (for blue), and #000000 (for black). If you have ever created any web pages¸ you would know that these hex codes are used in HTML to represent colors.

The col argument can also take numeric values. When it is set to a numeric value, the color corresponding to that index in the current color palette is used. For example, in the default color palette, the first color is black and the second color is red. So col=1 and col=2 refers to black and red, respectively. Index 0 corresponds to the background color.

There's more...

In many settings, col can also take a vector of multiple colors instead of a single color. This is useful if you wish to use more than one color in a graph. For example, in bar plot of sales data for three products across five cities. In that example, we used a vector of five colors to represent each of the five cities with the help of the heat.colors() function. The heat.colors() function takes a number as an argument and returns a vector of those many colors. So, heat.colors(5) produces a vector of five colors.

Type the following at the R prompt:

heat.colors(5)

You should get the following output:

[1] "#FF0000FF" "#FF5500FF" "#FFAA00FF" "#FFFF00FF" "#FFFF80FF"

These are the five colors in the hexadecimal format.

Another way of specifying a vector of colors is to construct it:

barplot(as.matrix(sales[,2:4]), beside=T,
legend=sales$City,
col=c("red","blue","green","orange","pink"),
border="white")

In the preceding example, we set the value of col to c("red","blue","green","orange","pink"), which is a vector of five colors.

We have to take care to create a vector that matches the length of the number of elements, in this case, the bars that we are plotting. If the two numbers don't match, R will recycle values by repeating colors from the beginning of the vector. For example, if we had fewer colors in the vector than the number of elements, say, if we had four colors in the previous plot, then R would apply the four colors to the first four bars and then apply the first color to the fifth bar. This is called recycling in R:

barplot(as.matrix(sales[,2:4]), beside=T,
legend=sales$City,
col=c("red","blue","green","orange"),
border="white")

In the example, both the bars for the first and last data rows (Seattle and Mumbai) would be of the same color (red), making it difficult to distinguish one from the other.

One good way to ensure that you always have the correct number of colors is to find out the length of the number of elements first and pass that as an argument to one of the color palette functions. For example, if we did not know the number of cities in the preceding example, we could execute the following to make sure that the number of colors matches the number of plotted bars:

barplot(as.matrix(sales[,2:4]), beside=T,
legend=sales$City,
col=heat.colors(length(sales$City)),
border="white")

We used the length() function to find out the length or the number of elements in the sales$City vector and passed that as the argument to heat.colors(). So, regardless of the number of cities, we will always have the right number of colors.

See also

In the next four recipes, we will see how to change the colors of other elements. The fourth recipe is especially useful; we look at color combinations and palettes there.