Chapter 6 Importing and Exporting Data in R

First, make sure that you are working in the right directory- the directory that you want to be. To see the current working directory, type getwd() into the console

getwd()
## [1] "/Users/Ercan/Desktop/R Books/RInto_Ercan"

You can change the working directory by typing setwd():

setwd("/Users/Ercan/Desktop/Beginning R")

getwd()

Alternatively, just select Session > Change Working Directory in Rstudio and navigate to the directory you want to make your new working directory.

Finally, type in dir() to see all the documents in the current directory.

The definitive guide for importing data in R is the R Data Import/Export manual available at R Data Import/Export manual(https://cran.r-project.org/doc/manuals/R-data.pdf)

6.1 Importing .csv, .txt, .delim files

6.1.1 Read .csv format (comma separated values)

To read data from a CSV file use either read.table or read.csv functions, the latter being a wrapper around the former. Suppose that you have a data set health.csv in your working directory and you want to introduce this to R:

# use read.table
health <- read.table(file = "health.csv", header = TRUE, sep = ",")  
# or use read.csv
health <- read.csv("health.csv")    
# the result is a date.frame
class(health)
## [1] "data.frame"

Note that when we used read.table, we explicitly specified the argument names file, header and sep.^{Function arguments can be specified without the name of the argument (positionally indicated), but specifying the arguments is good practice.] The second argument, header, indicates that the first row of data holds the column names. The third argument, sep, gives the delimiter separating data cells. Changing this to other values such as “\t” (tab delimited) or “;” (semicolon delimited) enables it to read other types of files. There are many other arguments to read.table, so if you want to learn more about this function use the help: ?read.table.

Or equivalently you can include the path:

health <- read.csv("/Users/Ercan/Desktop/Beginning R/health.csv")   

The readr package also provides a family of functions for reading text files. If your data set is large, using readr might be faster.

Here is an example: the roster of the men’s basketball team. I saved the roster as a comma-separated value (CSV) file and then read it into R using the read.csv function. Please note that in this case, the file roster.csv was saved in our working directory. Recall that earlier we discussed both getwd() and setwd(), these can be quite helpful. As you can see, when you create data using this method, the file will automatically become a data frame in R:

roster <- read.csv("roster.csv")

To learn about the content of a data frame:

str(roster)
## 'data.frame':    13 obs. of  6 variables:
##  $ Jersey  : int  0 1 3 5 10 12 15 20 21 33 ...
##  $ Name    : Factor w/ 13 levels "Ajukwa, Austin",..: 11 1 8 2 3 6 5 12 7 13 ...
##  $ Position: Factor w/ 3 levels "C","F","G": 3 3 3 2 3 3 2 3 3 2 ...
##  $ Inches  : int  74 78 74 79 75 73 80 72 76 80 ...
##  $ Pounds  : int  190 205 205 215 200 205 205 165 205 245 ...
##  $ Class   : Factor w/ 4 levels "freshman","junior",..: 1 4 2 4 1 3 1 2 3 2 ...

To view your data without editing them, you can use the View command:

View(roster)

6.1.2 Read .txt format

Suppose that you have a data set health.txt in your working directory and you want read this into R:

read.table("health.txt", sep = " ")

6.2 Importing .xls, xlsx, .sav, .sas Files

  • Use the package readxl to read in .xls and .xlsx files.
install.packages("readxl")
library(readxl)
read_excel("health.xlsx")
  • Use the package haven to read in SPSS, Stata and SAS files
install.packages("haven")
library(haven)
read_spss("Dataset.sav")
read_dta("Dataset.dta")
read_sas("Dataset.sas7bdat")

6.3 Importing Data from the Web

Suppose there is some data at `http://www.something.com/data.csv’ We can import this as

getLink <- "http://www.something.com/data.csv"

myData <- read.table(file = getLink, header = TRUE, sep = ",")

Unlike read.table, read_excel cannot read data directly from the Internet, and thus the files must be downloaded first. We could do this by visiting a browser or we can stay within R and use download.file.

# download the data. 
# Note that this will download to the current wd, but you can change it by specifying a path to "destfile" 
download.file(url='http://www.something.com/data.xlsx', 
              destfile='excelData.xlsx', method='curl') 
# read data
excelData <- read_excel("excelData.xlsx")

6.4 R Binary Files

When working with other R programmers, a good way to pass around data-or any R objects such as variables and functions—is to use RData files. These are binary files that represent R objects of any kind. They can store a single object or multiple objects and can be passed among Windows, Mac and Linux without a problem.

First, let’s create an RData file, remove the object that created it and then read it back into R:

health <- read.table(file = "health.csv", header = TRUE, sep = ",")  
# save the health.frame to disk 
save(health, file="health.rdata") 
# remove health from memory
rm(health)
# read it from the rdata file 
load("health.rdata")
# check if it exists now
head(health)
##   id gender state age health1 health2 health3 health4 health5 health6
## 1  1      M     1  51       1       4       2       1       4       5
## 2  2      F     3  35       2       3       3       2       3       4
## 3  3      F     1  29       5       2       4       2       1       3
## 4  4      M     1  21       5       1       5       4       2       1
## 5  5      M     2  56       2       4       2       4       3       3
## 6  6      M     3  72       1       5       4       2       4       5

Another example where we construct our own data set and then save multiple objects under a single .rdata file:

x <- 1:5
y <- letters[1:5]
z <- data.frame(x, y)
# save all three objects at once
save(x, y, z, file="multiple.rdata")
rm(x,y,z)
load("multiple.rdata")
x 
## [1] 1 2 3 4 5
y 
## [1] "a" "b" "c" "d" "e"
z
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
#head(multiple)

These objects are restored into the working environment, with the same names they had when they were saved to the RData file. That is why we do not assign the result of the load function to an object.

The saveRDS saves one object in a binary RDS file. The object is not saved with a name, so when we use readRDS to load the file into the working environment, we assign it to an object.

# create an object
x <- c(1, 5, 4) # view it
x
## [1] 1 5 4
# save to rds file
saveRDS(x, file='anObject.rds')
# read the file and save to a different object 
thatObject <- readRDS('anObject.rds')
# display it
thatObject
## [1] 1 5 4
# check they are the same
identical(x, thatObject)
## [1] TRUE

6.5 Loading Data from R

R and some packages come with data included, so we can easily have data to use. Accessing this data is simple as long as we know what to look for. ggplot2, for instance, comes with a dataset about diamonds. It can be loaded using the data function.

data(diamonds, package='ggplot2')
head(diamonds)
## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

6.6 Exporting data file

  • Write a file:
write.csv(health, "IntrotoR.final.csv", row.names = FALSE)
  • As before, in order to read data faster, we can use different packages. Use readr to export .csv files. About twice as fast as write.csv and never exports row names.
write_csv(health, 'healthExam.csv')
  • Use the package to export SPSS or Stata files
write_spss(health, "my_spss.sav")
write_dta(health, "my_stata.dta")