Introduction to R

Getting R and basic data analysis.

The first, and sometimes most daunting think about R is figuring out where to start. Once you download R and start it up, you are greeted with a command line. What the hell are you suppose to do with that!?

The first step is to download a front end for R, specifically RStudio which is a free program designed specifically for development in R. RStudio is made up of four boxes. The bottom left is your R console, which is actually running all of your commands. The top left is your R script, which is where you enter and save your R commands. The top right keeps track of the data that you have inputed and the names of the objects that you have defined. There is also a file explorer. The bottom right displays plots, downloadable packages, and help documents.

.r scripts can be run directly in R using RStudio or the built-in IDE in OSX. Every line with a "#" is a comment and will not be read by R.

It is always preferred that you write your own .r script so you can reuse it without having to rewrite everything over in the R console every time you want to re-run the analysis.

Lines without "#" can be sent to the R console by highlighting the line you want to run and pressing ctrl+Enter. You can also write new .r files directly in RStudio or you can use a text editor (nano, vim, textedit, notepad, Notepad++, etc.)

Example .r script:

Lets make a simple line plot

# The arrow, <-, indicates that everything to the right is associated with your named variable
# (an = also serves this purpose).

# use the concatenate function c() to groups everything into a list.

# Designate the values for your y-axis. "y" is the name of our variable.

y <- c(2, 3, 4, 5, 6)

# Do the same thing for your x axis.

x <- c(3, 2, 4, 5, 6)

# Make sure that the list lengths are the same!


# Call a lineplot

plot(x, y)

# Options that you can add to change the plot type, axis labels,
titles, colors.

You can look up many of these things online and R has good documentation.

# R documentation can be brought up two ways

?plot # Look up a specific command

??graph # Fuzzy look up that searches for search term in all packages/commands

Lets play with some data,

# First you need to figure out where R is looking for files.

# getwd() gets your "working directory."
# Usually where R is started from in your system.


# You can set your "working directory" to any folder you want.
# It works best if it is the same directory as your .r script.
# RStudio allows you to source your .r script to automatically set the working directory.

# This is found under the Session drop down menu.


# To read in your data, simply use the read.csv() command
# with the name of your file
# and h=T which tells R that your first row is your column names and
# every row beneath are separate observations.
# Make sure that your data is formatted in this way and is saved as either a
# .csv or a .txt file.

my_data <- read.csv("my_data.csv", h = T)

# my_data  <- read.table("my_data.txt", h = T)
# This is the syntax for reading in a .txt file


# This will tell you the column names in your data sheet.
# You should already know this but this is a good way to double check your naming
# scheme when you call specific columns

# Calling specific columns/data

my_data$final_weight # Calling the column names final_weight within my_data
my_data[,1] # Calling the first column in my_data
my_data[1,] # Calling the first row in my_data (below column names)
my_data[2, 3] # Calling the data pint in the second row and third column in my_data

# Some simple summary statistics

mean_fw <- mean(my_data$final_weight) # Average
sd_fw  <- sd(my_data$final_weight) # Standard Deviation
n_fw  <- length(my_data$final_weight) # n
se_fw  <- sd_fw/sqrt(n_fw) #S tandard Error

# Run a two-sample t-test

# If both variables are numeric
t.test(my_data$final_weight, my_data$inital_weight)

# If dependent is numeric and independent is a factor with two levels
t.test(my_data$final_weight ~ my_data$sex)

# Run an ANOVA

# Two or more independent variables
model.1 <- aov(my_data$final_weight ~ my_data$initial_weight + my_data$final_instar)

# Indepdent variable is a factor with 3 or more levels
model.2 <- aov(my_data$final_weight ~ my_data$food_type)

# Check assumptions

plot(model.1) # Four plots
plot(model.2) # Four plots

# Check p-vales, etc


Posted on Nov 17
Written by Matthew J. Lundquist

Terms of Use     Privacy Policy