Chapter 3 Data Types

In this part we are going to see different data types that are used in R and some basic operations on these data types.

Just to get the language right, everything we manipulate or encounter in R are called objects. For example, typing the expression x <- 2 creates the object x, and y <- "hello" creates another object y containing the word “hello”, etc. So objects can contain different kinds of data.

R has five very low level of objects which are called basic atomic classes of objects:

  • numeric data (real numbers)

  • integers

  • character

  • complex numbers

  • logicals (True/False)

The type of data contained in an object is checked with class function:

x <- 2
[1] "numeric"

3.1 Numeric Data

Numbers in R are generally treated as numeric objects (i.e. double precision real numbers). Numeric data handles both integers and decimals.

x = 3.5        # assign a value  
x              # print the value of x 
[1] 3.5 

class(x)       # print the class name of x 
[1] "numeric"

Even if we assign an integer to a variable k, it is still being saved as a numeric value.

x = 1 
x              # print the value of x 
[1] 1 

class(x)       # print the class name of x 
[1] "numeric"

There is also a special number Inf which represents infinity; e.g. 1/0 gives inf. Inf can be used in ordinary calculations. For example, 1/Inf is 0.

The value NaN represents an undefined value (“not a number”), e.g. 0/0 would produce NaN. NaN can also be thought of as a missing value. We are going to say more on that later.

3.2 Integers

A numeric value stored in a variable is automatically assumed to be numeric, so if you explicitly want an integer type, you need to specify the L suffix. For example, entering 1 gives you a numeric object, on the other hand entering 1L explicitly gives you an integer.

Testing whether a variable is numeric is done with the function is.numeric

x <- 2.3

[1] TRUE

x <- 2L
[1] TRUE

Another way of creating an integer data is to use as.integer function

y = as.integer(3) 
y              # print the value of y 
[1] 3 

class(y)       # print the class name of y 
is.integer(y)  # is y an integer? 
[1] TRUE

3.3 Characters

A character object is used to represent string values in R.

x <- "hello"
[1] "hello"

We convert objects into character values with the as.character function

To find the length of a character or numeric data we use the nchar function:

x <- "value"
[1] 5

x<- 1881
[1] 4

Another example:

vec_char <- c("My", "Great", "Title")
## [1] "My"    "Great" "Title"
## [1] 3

paste(vec_char, collapse = " ")
## [1] "My Great Title"
myGreatTitle <- paste(vec_char, collapse = " ")
myGreatTitle <- c(myGreatTitle, ": yet to come!")
paste(myGreatTitle, collapse = " ")
## [1] "My Great Title : yet to come!"

Using the function paste() we can join two character vectors that are each of length 1:

paste("My", "great", "title", sep = " ")
## [1] "My great title"
length(paste("My", "great", "title", sep = " "))
## [1] 1

or we can join two vectors of length greater than 1:

paste(c("My", "great", "title"), 1:3, sep = "")
## [1] "My1"    "great2" "title3"

When the vectors are not of equal length then R recycles the shorter vector until it matches the length of the longer one:

paste(letters, 1:5, sep = "-")
##  [1] "a-1" "b-2" "c-3" "d-4" "e-5" "f-1" "g-2" "h-3" "i-4" "j-5" "k-1"
## [12] "l-2" "m-3" "n-4" "o-5" "p-1" "q-2" "r-3" "s-4" "t-5" "u-1" "v-2"
## [23] "w-3" "x-4" "y-5" "z-1"

It is also worth noting that the numeric vector 1:5 gets coerced into a character vector by the paste() function. We will talk more about coercion later.

3.4 Complex Numbers

As the name suggests this data type handles complex numbers. In this book, we are not going to encounter with complex numbers oftern (unless you do something wrong!) so I am not going to go into the detail of this data type.

x <- 2 + 3i 
[1] 2+3i

Note that it is not x <- 2 + 3*i.

3.5 Logical

Logical data represents data that can be either TRUE or FALSE.

x <- TRUE           # logical
[1] "logical"

is.logical(x)       # logicals have their own test function.
[1] TRUE

as.numeric(TRUE)    # numeric
[1] 1

Numerically, TRUE is the same as 1 and FALSE is the same as 0.

x <- TRUE * 2
[1] 2

y <- FALSE + 3
[1] 3

Note that when mixing with numeric data logical data automatically treated as numeric.

Logicals can result from comparing two numbers, characters or conditions. Main operators that produce logical data are summerized in the table below.

R’s Comparison and Logical Operators:

Operator/Function R Command Example Output
Equality == 2==3 FALSE
Not equal != 2!=3 TRUE
Negation !() !(2==3) TRUE
Greater than > 3>2 TRUE
Less than < 3<2 FALSE
Greater than or equal >= 3>=2 TRUE
Less than or equal <= 3<=2 FALSE
And & (3<=2)&(5>3) FALSE
Or | (3<=2)|(5>3) TRUE

Note that the equality operator == is different from the usual equality operator =. As with the mathematical operators and the logical operators are also vectorized.

Some examples:

2 > 3
## [1] FALSE

2 != 3
## [1] TRUE

x <- 1:5
y <- 5:1
x >= y

"value" == "home"
## [1] FALSE

"value" > "house"
## [1] TRUE

3.6 and Dates

Dates are not really one of the atomic objects but I am going to present them in this section as another type of data in R.

There are two main formats for dates data: Date and POSIXct. Date stores just a date while POSIXct stores a date and time. Both objects are actually represented as the number of days (Date) or seconds (POSIXct) since January 1, 1970.

As I mention above dates is not a separate data type. For example, if we express a date as "2015-07-20" it is a character data and we use as.Date function to convert character to dates:

x <- as.Date("2015-07-20")
# number of days between 01/01/1970 and 07/20/2015 
## [1] "2015-07-20"

## [1] "Date"

## [1] 16636

Compare the result with this:

x <- as.POSIXct("2015-07-20 12:00")
# number of seconds between 01/01/1970 00:00 and 07/20/2015 12:00
## [1] "2015-07-20 12:00:00 EDT"

## [1] "POSIXct" "POSIXt"

## [1] 1437408000

General format of as.Date() function is

as.Date(x, "input_format")

where x is the character data and input_format gives the appropriate format for reading the date. Date formats are presented in the following table.

Symbol Meaning________________ Example
%d Day as a number (0-31) 01-31
%a Abbrevated weekday Mon
%A Unabbrevated weekday Monday
%m Month (00-12) 00-12
%b Abbrevated month Jan
%B Unabbrevated month January
%y Two-digit year 19
%Y Four-digit year 2019

The default format for inputting dates is yyyy-mm-dd. The statement

mydates <- as.Date(c("2007-06-22", "2004-02-13"))
## [1] "2007-06-22" "2004-02-13"

converts the character data to dates using this default format. In contrast,

strDates <- c("01/05/1965", "08/16/1975")
mydates <- as.Date(strDates, "%m/%d/%Y")
## [1] "1965-01-05" "1975-08-16"

reads the data using a mm/dd/yyyy format.

Once the variable is in date format, you can analyze and plot the dates using the wide range of analytic techniques covered in later chapters.

Two functions are especially useful for time-stamping data. Sys.Date() returns today’s date, and date() returns the current date and time.

You can use the format(x, format=“output_format”) function to output dates in a specified format and to extract portions of dates:

today <- Sys.Date()
format(today, format="%B %d %Y")
## [1] "February 04 2019"

format(today, format="%A")
## [1] "Monday"

When R stores dates internally, they’re represented as the number of days since January 1, 1970, with negative values for earlier dates. That means you can perform arithmetic operations on them. For example,

startdate <- as.Date("2004-02-13")
enddate   <- as.Date("2011-01-22")
days      <- enddate - startdate
## Time difference of 2535 days

displays the number of days between February 13, 2004 and January 22, 2011.

Exercise (Dates) Discuss what happens when instead of using as.Date you accidentally saved as

x <- "2015-07-20"

How to convert a numeric value to a date?