# Clear memory
rm(list=ls())
knitr::opts_knit$set(root.dir = "C:/Users/adamd/Documents/psci200")Intro to R: Working with Data
Working Directory
To set the working directory, we will put it in a “setup” chunk. Replace the file path below with the path for your folder. Note for Windows users: use forward slashes. Example: “c:/Users//Desktop/psci200”
Now you should see your folder’s path when you run the code below:
getwd()[1] "C:/Users/adamd/Documents/psci200"
Loading Data Sets
R has many example data sets. To see all the available data sets, we can use data().
data()We can check which objects and data sets we’ve already loaded with ls().
ls()character(0)
Let’s load a data set called cars.
# Load the "cars" dataset
data(cars)
ls()[1] "cars"
We see that we now have a data frame loaded. Let’s use some commands to better understand the cars data set.
# View the first few lines of the data frame
head(cars) speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
# View the names of variables in the data frame
names(cars)[1] "speed" "dist"
# Dimensions of the dataset
dim(cars) # rows, columns[1] 50 2
nrow(cars) # number of rows[1] 50
ncol(cars) # number of columns[1] 2
length(cars) # is this the number of rows?[1] 2
Now that we understand a bit about the data set as a whole, let’s try looking at a specific variable (column) in the data set.
#speed # ??
cars$speed # Refer to variables with $ [1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15
[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25
cars$dist [1] 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 34 46
[20] 26 36 60 80 20 26 54 32 40 32 40 50 42 56 76 84 36 46 68
[39] 32 48 52 56 64 66 54 70 92 93 120 85
# Summary statistics
mean(cars$speed)[1] 15.4
median(cars$dist)[1] 36
# Plotting
plot(cars$speed, cars$dist, xlab="Speed", ylab="Distance")Loading an External R Data Set
While are has a lot of built in data sets, most of the time we will want to load data from elsewhere. For this practice, we will load a survey from PEW that measured public opinion about Obama’s economic performance.
# obamaecon = {1, 2, 3} where 1 is poor, 2 is neutral, 3 is favorable
# Load the dataset
#load('pew_jan_15.rdata')
ls()[1] "cars"
Let’s explore the data set.
# names(pewjan15)
# head(pewjan15)
# pewjan15$age
#
# # Tables
# table(pewjan15$sex, pewjan15$attend)
# tab = table(pewjan15$sex, pewjan15$attend)
# stab = sum(tab)
# tab / stabHelp and Apropos
If you know a command name, but don’t know how to use it, use help or ?.
# help('seq')
# ?seqIf you don’t know the exact name of a command but know part of it, use apropos.
apropos('tabl') [1] ".S3_methods_table" "[.table" "aperm.table"
[4] "as.data.frame.table" "as.relistable" "as.table"
[7] "as.table.default" "ftable" "is.relistable"
[10] "is.table" "margin.table" "model.tables"
[13] "pairwise.table" "print.summary.table" "print.table"
[16] "prop.table" "r2dtable" "read.ftable"
[19] "read.table" "summary.table" "table"
[22] "write.ftable" "write.table" "xyTable"
Else, use Google or an LLM to find a function that will suit your needs.
Practice Session
# Add your practice code here