Chapter 13 Programming in R

13.1 Style Guidelines

These are some conventions that are widely applied by programmers:

  • File names should end in .R

  • Identifiers for numbers and vectors should be written in lowercase. For matrices in uppercase. Function names should use the CapsWords convention.

  • Indentation: two spaces.

  • Spacing: put spaces around all operators ’ = , + , - , <- `, etc..

  • Curly braces: first on same line, last on own line.

  • Comments: after the # symbol, add a space.

  • Constants: constants are defined only with uppercase letters. Example, say you want to define a constant: \(\alpha = 3\), define it like this: ALPHA = 3.

13.2 Logical Operators

The table presents the logical operators that we will need to write control flows. Each of these evaluates to a logical result of TRUE or FALSE.

R’s Comparison and Logical Operators:

Operator/Function R Command Example Output
Equality == 2==3 FALSE
Not equal != 2!=3 TRUE
Greater than > 3>2 TRUE
Less than < 3<2 FALSE
Greater than or equal >= 3>=2 TRUE
Less than or equal <= 3<=2 FALSE
And & (3<=2)&(5>3) FALSE
Or | (3<=2)|(5>3) TRUE

Note that the equality operator == is different from the = sign, which is used as an assignment operator. As with the mathematical operators and the logical operators, these are also vectorized:

x <- 1:5
y <- 5:1

x >= y
## [1] FALSE FALSE  TRUE  TRUE  TRUE

13.3 Conditional Statements and Control Flows

It is often very useful to execute actions only if certain conditions are met (conditional statements if and if-else), or to execute the same action for a certain number of times or until a certain condition is met (control statements for and while, respectively). For example, if you would like to run a simulation, these programming tools are inevitable. In the next few sections, we will learn about some of these algorithms to illustrate control flows.

13.3.1 if-else

The if() statement allows us to control which statements we want to be executed.

Syntax:

if (condition) {
                do this if the condition is TRUE
                     } else {
                     do this if the condition is FALSE 
                            }

Suppose if a > b then c should be equal to 20, else c should be equal to 10. In R code, for a = 2 and b = 3 this would be written like this:

a = 2
b = 3
if (a > b) {
            c <- 20
           } else {
            c <- 10 
                  }
c
## [1] 10

It is also possible to add multiple statements. For example:

x = 11
if (x %% 3 == 0) {
                    print("x is divisible by 3")
            } else if (x %% 2 == 0) {
                      print("x is divisible by 2")
            } else {
                      print("x is neither divisible by 2 nor 3")
                  }
## [1] "x is neither divisible by 2 nor 3"

11 being neither divisible by 2 nor 3, it is the last phrase that will be printed. Note that the second condition is written as else if whereas the last condition is simply else because this is the only remaining option.3

13.3.2 ifelse

There is another command similar to if-else, maybe a bit more complicated to understand at first, but much faster, called ifelse.

Syntax:

ifelse(test, true_value, false_value)

For example, consider the example that if a > b then c should be equal to 20, else c should be equal to 10. One can achieve the same result as in if-else by writing the following code:

a = 2
b = 3
c <- ifelse(a > b, 20, 10)
c
## [1] 10

The above command means exactly the same as previously: if (a > b) then the result is 20, else this result is 10. Then the result is saved in variable c and you need to call to see the assigned value. So why not just use ifelse? This is because the whole if ... else construct is much more general than ifelse. ifelse only works for assigning values conditionally, but that’s not always what we want to do.

The following example creates a dummy variable from a randomly generated sample data:

x <- sample(10, 15, replace = TRUE)
D <- ifelse(x<5, 1, 0)
D
##  [1] 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0

In the following example, the condition involves putting two conditions:

x <- 1:10 
ifelse(x<5 | x>8, x, 0)
##  [1]  1  2  3  4  0  0  0  0  9 10

13.3.3 For loops

For loops make it possible to repeat a set of instructions for a certain number of times.

Syntax:

for(variable in sequence) { 
    statements to be executed
}

For example, try the following:

for (i in 1:4){
        print("UNCC")
              }
## [1] "UNCC"
## [1] "UNCC"
## [1] "UNCC"
## [1] "UNCC"

Note that in the example above the index i hasn’t been used in the statements to be executed. In the next example, the indices are operational.

Example: Compute the sum of the first 100 integers

temporarySum <- 0
for (i in 1:100){
        temporarySum <- temporarySum + i
                }
print(temporarySum)
## [1] 5050

What happened in that loop? First, we defined a variable called temporarySum and set it to 0. Then, when the loops starts, i equals 1, so we add temporarySum to 1, which is 1. Then, i equals 2, and again, we add temporarySum to i. But this time, temporarySum equals 1 and i equals 2, so now temporarySum equals 3, and we repeat this until i equals 100.

Example: Compute the factorial \(n!\):

n <- 5
result <- 1
for (i in 1:n){
        result <- result * i
              }
print(result)
## [1] 120

Example: Generate a data vector

x <- 1:8
z <- NULL
for(i in 1:length(x)) { 
    if(x[i] < 5) { 
        z <- c(z, x[i] + 10)  
    } else { 
        z <- c(z, x[i] - 10)  
    } 
}
z
## [1] 11 12 13 14 -5 -4 -3 -2

Her note that z starts as NULL then at each iteration its size increases by one. Also, instead of writing i in 1:length(x) to indicate the values that the index i will assume, we could also use “i in seq(along=x)”, in which case the would be the following:

x <- 1:8
z <- NULL
for(i in seq(along=x)) { 
    if(x[i] < 5) { 
        z <- c(z, x[i] + 10)  
    } else { 
        z <- c(z, x[i] - 10)  
    } 
}
z
## [1] 11 12 13 14 -5 -4 -3 -2

If we want to stop on a condition and print an error message:

x <- 1:10
z <- NULL
for(i in seq(along=x)) { 
    if (x[i] < 5) { 
        z <- c(z,x[i]-1)  
    } else { 
        stop("values need to be <5") 
    }
}

z

Example. Create a function that returns \(n\)th Fibonacci number. Recall that Fibonacci numbers follow from the recursion: \(x_{n} = x_{n-1} + x_{n-2}\) for \(n > 3\) and \(x_{1}=x_{2}=1\).

n <- 8 # this code returns 8th Fibonacci number
a <- 0
b <- 1
for (i in 1:n){
temp <- b
b <- a
a <- a + temp
}
print(a)
## [1] 21

Inside the loop, we defined a variable called temp. Defining temporary variables is usually very useful inside loops. Let’s try to understand what happens inside this loop:

  • First, we assign the value 0 to variable a and value 1 to variable b.
  • We start a loop, that goes from 1 to n.
  • We assign the value inside of b to a temporary variable, called temp.
  • b becomes a.
  • We assign the sum of a and temp to a.
  • When the loop is finished, we return a.

What happens if we want the 3rd Fibonacci number? At i = 1 we have first a = 0 and b = 1, then temp = 1, b = 0 and a = 0 + 1. Then i = 2. Now b = 0 and temp = 0. The previous result, a = 0 + 1 is now assigned to b, so b = 1. Then, a = 1 + 0. Finally, i = 3. temp = 1 (because b = 1), the previous result a = 1 is assigned to b and finally, a = 1 + 1. So the third Fibonacci number equals 2. Reading this might be a bit confusing; I strongly advise you to run the algorithm on a sheet of paper, step by step.

Example: Create the first 10 Fibonacci numbers 1,1,2,3,5,8,13,21,34,55

Fibo <- numeric(10) # sets up a numeric vector of length 10
for(i in 1:10) { 
    if(i <= 2){
    Fibo[i] = 1
    } else{
    Fibo[i] = Fibo[i-1] + Fibo[i-2]
    }
}
Fibo
##  [1]  1  1  2  3  5  8 13 21 34 55

13.3.4 While loops

while loops are very similar to for loops. The instructions inside a while loop are repeated while a certain condition holds true, and how many repetitions will take place is not known in advance.

Syntax:

while(condition) { 
    statements to be executed
}

A simple examples:

z <- 0
while(z < 5) { 
    z <- z + 2
    print(z)  
}
## [1] 2
## [1] 4
## [1] 6

Let’s consider the sum of the first 100 integers again using a while loop:

result = 0
i=1
while (i<=100){
        result <- result + i
             i <- i + 1
}
print(result)
## [1] 5050

Here, we first set result and i to 0 and 1, respectively. Then, while i is less than or equal to 100, we add i to result. Notice that there is a line more than in the for loop: we need to increment the value of i, if not, i would stay equal to 1, and the condition would always be fulfilled, and the program would run forever (not really, only until your computer runs out of memory).

Example: The set of all Fibonacci numbers less than a certain number \(N\):

FiboN <- c(1,1) # we know that 1, 1 are the first two Fibonacci numbers
N = 50
i = 3
while (FiboN[i-1] + FiboN[i-2] <= N) { 
  FiboN <- c(FiboN, FiboN[i-1]+FiboN[i-2])
  i = i + 1
}
FiboN
## [1]  1  1  2  3  5  8 13 21 34

13.3.5 Apply Loop Family

There are some other commonly used loops that I don’t mention at this point: apply(), lappy(), sapply(). For simple conditions these functions might be faster than the ones we studied so far.

13.4 Declaring functions in R

One of the goals of a computer is to alleviate you from doing repetitive and boring tasks. In the previous section, we have seen how we can use loops to let the computer repeat hundreds of instructions for us. In this section, we will discover another object, called functions. In programming languages such as R, functions have the same meaning as in mathematics: a function takes one or several argument(s) and return a value. For example, you should be familiar with the function \(f(x) = \sqrt{x}\) that returns the square root of \(x\). Of course, this function is already pre-programmed inside R. Try the following:

sqrt(4)
## [1] 2

R should give you the result, 2. But very often, it is useful to define your own functions and this is what we are going to learn in this section.

Suppose you want to create the following function: \(f(x)= 1/\sqrt{x}\). This is the syntax you would use:

myFunction <- function(x){
      return(1/sqrt(x))
      }

It is always a good idea to add some comments that explain what the function does:

# This function takes one argument, x, 
# and returns the inverse of its square root.
myFunction <- function(x){
 return(1/sqrt(x))
}

Always give your function very explicit names! In mathematics it is standard to give functions just one letter as a name. Never do that in programming! Functions are very flexible. But remember that you can only return one value, or one object. Sometimes you may need to return more that one value. To be able to do this, you must put your values in a list, and return the list of values. You can put a lot of instructions inside a function, such as loops.

A simple function that adds one to the number entered as the function argument:

addOne <- function(a){ # a is the function argument
  return(a + 1)
  }

Now, we created a function addOne and for example let’s try out

addOne(3)
## [1] 4

Example. Compute the skewness of a given vector x:

mySkew <- function(x) {
m3    <- mean((x-mean(x))^3)  # mean is already a function in R 
skew  <- m3/(sd(x)^3)         # std dev in R is given by sd
return(skew)                  # typical output of a function 
}                             # functions begin and end with curly braces

Now try out:

x <- rnorm(100, 5, 2)
mySkew(x)
## [1] -0.04382396

Example. Compute the skewness of a given vector x:

myMoments <- function(x) {
# this functions returns the first three moments for a vector of data
m1    <- mean(x)                              # mean is already a function in R
m2    <- (sum((x-mean(x))^2))/(length(x) - 1) # variance    
m3    <- mean((x-mean(x))^3)        
skew  <- m3/(sd(x)^3)                         # skewness
result = c(m1, m2, m3)                        # collect in a vector
return(result)                                # return the vector
}

Now try out:

x <- rnorm(100, 5, 2)
myMoments(x)
## [1]  4.963137  4.236104 -2.982425

Example. You can pass parameters with functions:

myFun <- function(x, ALPHA, B=0){
  result <- sin(x[1]) - cos(x[2] - ALPHA) + x[3]^2 + B
  return(result)
}
myFun(c(2, 4, 1), ALPHA = 3, B = 2) 
## [1] 3.368995

Now try out these:

myFun(c(2,4,1), ALPHA = 1)        # computes for the default value of B, B = 0
## [1] 2.89929
myFun(c(2,4,1), ALPHA = 1, B=2)   # computes for B = 2 
## [1] 4.89929

Example. Create a function that returns \(n\)th Fibonacci number with iterative and recursive algorithms.

This is the loop that we created above, here we just put it into a function:

Fibo <- function(n){
       a <- 0
       b <- 1
       for (i in 1:n){
        temp <- b
        b <- a
        a <- a + temp
}
return(a) 
}

The above algorithm is called an iterative algorithm, because it uses a loop to compute the result, which is based on successive iteration. Let’s look at another way to think about the problem, with a recursive algorithm:

FiboRecur <- function(n){
       if (n == 0 | n == 1){
       return(n)} else {
       return(FiboRecur(n-1) + FiboRecur(n-2))
} 
}
FiboRecur(7)
## [1] 13

This algorithm should be easier to understand: if n = 0 or n = 1 the function should return n (0 or 1). If n is strictly bigger than 1, FiboRecur should return the sum of FiboRecur(n-1) and FiboRecur(n-2). This version of the function is very much the same as the mathematical definition of the Fibonacci sequence. So why not use only recursive algorithms then?

Try to run the following:

system.time(Fibo(30))
##    user  system elapsed 
##   0.004   0.000   0.004

The result should be printed very fast (the system.time() function returns the time that it took to execute Fibo(30)). Let’s try with the recursive version:

system.time(FiboRecur(30))
##    user  system elapsed 
##   1.441   0.011   1.452

It takes much longer to execute! Recursive algorithms are very CPU demanding, so if speed is critical, it’s best to avoid recursive algorithms. Also, in FiboRecur try to remove this: if (n == 0 | n == 1) and try to run FiboRecur(5) for example and see what happens. You should get an error: this is because for recursive algorithms you need a stopping condition, or else, it would run forever. This is not the case for iterative algorithms, because the stopping condition is the last step of the loop.

Example. A function that generates a scatterplot

corr_plot <- function(x, y, plotit) { 
  if (plotit == TRUE) {
    plot(x, y) 
    cor(x, y)
    }else{cor(x, y) 
    }
}

Now try these two:

corr_plot(rnorm(10), rnorm(10), FALSE)
## [1] -0.03353916

corr_plot(rnorm(10), rnorm(10), TRUE)

## [1] 0.2263416

There are three control utilities for functions, return, warning and stop:

  • return. The evaluation flow of a function may be terminated at any stage with the return function. This is often used in combination with conditional evaluations.

  • stop. To stop the action of a function and print an error message, one can use the stop function.

  • warning. To print a warning message in unexpected situations without aborting the evaluation flow of a function, one can use the function warning(“…”).

Example. Function with a warning message

myfct <- function(x1) {
        if (x1>=0) {
        print(x1) 
        } else {
        stop("This function did not finish, because x1 < 0")
        }
        warning("Value needs to be > 0")
        }

Now try out the following

myfct(x1 = 2)
## [1] 2
## Warning in myfct(x1 = 2): Value needs to be > 0

and

myfct(x1 = -2)

13.5 Examples

This section contains a few interesting examples.

Example (Finding Primes) Suppose we want to find the prime numbers up to a given integer \(n\). A simple algorithm is the following

  • Create a vector \(\mathbf{x} = (2, \ldots,n)\)

  • Eliminate all multiples of 2 from \(\mathbf{x}\) which are larger than 2

  • Move on to the next integer, say \(x_2\), in remaining in the new \(\mathbf{x}\) vector and eliminate all multiples of \(x_2\) which are greater than \(x_2\) from \(\mathbf{x}\)

  • Continue until all the numbers in \(\mathbf{x}\) is exhausted

The following function implements this algorithm:

primeFinder <- function(n) {
  if (n >= 2) {
    x <- seq(2, n) #contains all the candidates to be tested
    primes <- NULL #starts with NULL, but will contain all primes less than n
    for (i in seq(2, n)) {
      if (any(x == i)) {
        primes <- c(primes, i)
        x <- c(x[(x %% i) != 0], i)
      } 
      }
    return(primes) 
    } else {
      stop("Input value of n should be at least 2.") 
      }
}

Now try out

primeFinder(100)
##  [1]  2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83
## [24] 89 97

Example (Newton’s method of root finding)

We want to find roots of a differentiable function \(f(x)\), i.e. \[ f(x)=0 \implies x=? \] The idea behind Newton’s method of root finding is based on Taylor approximation of the function \(f(x)\). Suppose that \(x_0\) is an initial candidate we want to test whether it satisfies \(f(x_0) = 0\) and if it does not satisfies the condition (in general it won’t) we want an algorithm that will go to a second point to test and continue until the condition is satisfied. Let’s take a first order Taylor approximation around the initial guess \(x_0\): \[ f(x) = f(x_0) + (x-x_0)f'(x_0) +\frac{1}{2} (x-x_0)^2f''(x_0^*), \] where \(x_0^* = \lambda x + (1-\lambda)x_0\) for some \(\lambda \in [0,1]\). Now assuming that \(|x-x_0|\) is small, the term \((x-x_0)^2\) is negligible relative to the term \((x-x_0)\). So, provided, \(f''\) is bounded, we may write \[ f(x) \approx f(x_0) + (x-x_0)f'(x_0) \]

which suggests that if \(f(x)=0\) is satisfied then \(x\) can be solved as

\[ x = x_0 - \frac{f(x_0)}{f'(x_0)} \]

Let’s call this point \(x_1\) and put back into the function (not into its first order approximation above) \(f(x)\) to test if \(f(x_1)=0\). If \(f(x_1)\neq 0\), then take Taylor approximation around \(x_1\) to obtain another point \(x_2\) to test if \(f(x_2)=0\). Continue until \(f(x_n)=0\), for some \(n\).4

As an example, suppose \(f(x) = x^3 +2x-7\) and take the initial point as \(x_0 = 1\).

x <- 1
fNewton <- x^3 + 2*x^2 - 7 
tolerance <- 0.000001
while (abs(fNewton) > tolerance) {
  fNewtonPrime <- 3*x^2 + 4*x       # you need to derive f'(x) by hand
  x <- x - fNewton / fNewtonPrime   # get the updated point
  fNewton <- x^3 + 2*x^2 - 7        # evaluate f at the updated point
}
x
## [1] 1.428818

Suppose that we have two differentiable functions \(g\) and \(h\) and we would like to find the points that they are equal. We can use Newton’s method to solve this problem. Just define \[ f(x) = g(x) - h(x) \] and note that \(g(x) = h(x) \iff f(x)=0\). Now apply Newton’s method to \(f(x)\).


  1. The \%\% operator is the modulus operator, which gives the remainder of the division of x by 2 or 3.

  2. For the details of Newton’s method see Wikipedia