Working with data in R


How to use built in data in R

Working with built-in data in R is really easy. It just requires you to call the data using a simple function.

Calling the Iris data set

The Iris data set is a classic set of petal and sepal measurements from three Iris species.

A list of all built-in data sets can be obtained by using the data() command

# How to import local data files
# For .csv files
data_name <- read.csv("/path/to/file_name.csv", head = TRUE,
stringsAsFactors = TRUE)
head(data_name)
# For .xlxs
library(readxl)
data_name <- read_excel("/path/to/file/_name.xlsx")

How to use data from the internet

This is a really strong tool for downloading data from websites that are hosting it.

.csv files

The most holy of file types for data sheets is .csv (comma separated values). Importing a .csv file from the internet is very easy.

Letโ€™s download COVID-19 data from data.gov

We used the download.file() and read.csv() function

# Download data from the internet
# If the file is in .csv format
tmp <- tempfile(fileext = ".csv")
download.file(url = "https://data.cdc.gov/resource/9bhg-hcku.csv",
destfile = tmp)
covid_data <- read.csv(tmp, header = TRUE, stringsAsFactors = TRUE)
head(covid_data)
# If the file is in .xlsx format
library("readxl")
tmp <- tempfile(fileext = ".xlsx")
download.file(url = "https://dryad.org/stash/downloads/file-stream/1936219",
destfile = tmp)
brassica_data <- read_excel(tmp)
head(brassica_data)
view raw download_data.r hosted with ❤ by GitHub

.xlsx files

If .csv files are the light side, .xlsx are certainly the dark side of data sheets (well not really). It just requires a external package โ€œreadxlโ€ to read in the file.

Here we will download an .xlsx file of the data:

Airborne cues accelerate flowering and promote photosynthesis in Brassica rapa, from Dryad.

# Download data from the internet
# If the file is in .csv format
tmp <- tempfile(fileext = ".csv")
download.file(url = "https://data.cdc.gov/resource/9bhg-hcku.csv",
destfile = tmp)
covid_data <- read.csv(tmp, header = TRUE, stringsAsFactors = TRUE)
head(covid_data)
# If the file is in .xlsx format
library("readxl")
tmp <- tempfile(fileext = ".xlsx")
download.file(url = "https://dryad.org/stash/downloads/file-stream/1936219",
destfile = tmp)
brassica_data <- read_excel(tmp)
head(brassica_data)
view raw download_data.r hosted with ❤ by GitHub

How to subset data

You can use the subset() function to extract data from a dataset based on a factor or set or factors. Here is an example using the Iris dataset:

# Subsetting data in R
# Subset Iris data into a dataframe with only one species
virginica <- subset(iris, iris$Species == "virginica")
# Subset a particular vector with only the values from a particular group
virginica_sepals <- iris$Sepal.Length[iris$Species == "virginica"]
view raw subset.r hosted with ❤ by GitHub
Subset data in R

Exporting data

You might also want to export data as a .csv to send to someone you are collaborating with. This is a really simple process using the write.csv() command.

# Exporting data as a .csv in R
write.csv(iris, "iris_export.csv", row.names = FALSE)
view raw export_csv.r hosted with ❤ by GitHub

Importing data video: