上QQ阅读APP看书,第一时间看更新
Loading and Exploring a Dataset Using R Functions
In this section, we'll load and explore a dataset using R functions. Before starting with the implementation, check the version by typing version in the console and checking the details, as follows:
Let's begin by following these steps:
- Install the following packages and libraries:
install.packages("ggplot2")
install.packages("tibble")
install.packages("dplyr")
install.packages("Lock5Data")
- Get the current working directory by using the getwd(".") command:
[1] "C:/Users/admin/Documents/GitHub/Applied-DataVisualization-with-ggplot2-and-R"
- Set the current working directory to Chapter 1 by using the following command:
setwd("C:/Users/admin/Documents/GitHub/Applied-DataVisualization-with-ggplot2-and-R/Lesson1")
- Use the require command to open the template_Lesson1.R file, which has the necessary libraries.
- Read the following data file, provided in the data directory:
df_hum <- read.csv("data/historical-hourly-weather-data/humidity.csv")
When we used read.csv, a structure called a data frame was created in R; which we are all familiar with it. Let's type some commands to get an overall impression of our data.
Let's retrieve some parameters of the dataset (such as the number of rows and columns) and display the different variables and their data types.
Let's retrieve some parameters of the dataset (such as the number of rows and columns) and display the different variables and their data types.
The following libraries have now been loaded:
- Graphical visualization package:
require("ggplot2")
- Build a data frame or list and some other useful commands:
require("tibble")
require("dplyr") - Reference: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html.
- A built-in dataset package in R:
require("Lock5Data")
Use the following commands to determine the data frame details, as follows:
#Display the column names
colnames(df_hum)
Take a look at the output screenshot, as shown here:
Use the following command:
#Number of columns and rows
ndim(df_hum)
A summary of the data frame can be seen with the following code:
str(df_hum)
Take a look at the output screenshot, as shown here: