# Chapter 13 Programming in R

## 13.1 Style Guidelines

These are some conventions that are widely applied by programmers:

File names should end in

`.R`

Identifiers for numbers and vectors should be written in lowercase. For matrices in uppercase. Function names should use the CapsWords convention.

Indentation: two spaces.

Spacing: put spaces around all operators ’ = , + , - , <- `, etc..

Curly braces: first on same line, last on own line.

Comments: after the # symbol, add a space.

Constants: constants are defined only with uppercase letters. Example, say you want to define a constant: \(\alpha = 3\), define it like this:

`ALPHA = 3`

.

## 13.2 Logical Operators

The table presents the logical operators that we will need to write *control flows*. Each of these evaluates to a logical result of `TRUE`

or `FALSE`

.

**R’s Comparison and Logical Operators:**

Operator/Function |
R Command |
Example |
Output |
---|---|---|---|

Equality | `==` |
`2==3` |
FALSE |

Not equal | `!=` |
`2!=3` |
TRUE |

Greater than | `>` |
`3>2` |
TRUE |

Less than | `<` |
`3<2` |
FALSE |

Greater than or equal | `>=` |
`3>=2` |
TRUE |

Less than or equal | `<=` |
`3<=2` |
FALSE |

And | `&` |
`(3<=2)&(5>3)` |
FALSE |

Or | `|` |
`(3<=2)|(5>3)` |
TRUE |

Note that the equality operator `==`

is different from the `=`

sign, which is used as an assignment operator. As with the mathematical operators and the logical operators, these are also vectorized:

```
x <- 1:5
y <- 5:1
x >= y
## [1] FALSE FALSE TRUE TRUE TRUE
```

## 13.3 Conditional Statements and Control Flows

It is often very useful to execute actions only if certain conditions are met (conditional statements `if`

and `if-else`

), or to execute the same action for a certain number of times or until a certain condition is met (control statements `for`

and `while`

, respectively). For example, if you would like to run a simulation, these programming tools are inevitable. In the next few sections, we will learn about some of these algorithms to illustrate control flows.

### 13.3.1 if-else

The `if()`

statement allows us to control which statements we want to be executed.

**Syntax:**

```
if (condition) {
do this if the condition is TRUE
} else {
do this if the condition is FALSE
}
```

Suppose if `a > b`

then `c`

should be equal to 20, else `c`

should be equal to 10. In `R`

code, for `a = 2`

and `b = 3`

this would be written like this:

```
a = 2
b = 3
if (a > b) {
c <- 20
} else {
c <- 10
}
c
## [1] 10
```

It is also possible to add multiple statements. For example:

```
x = 11
if (x %% 3 == 0) {
print("x is divisible by 3")
} else if (x %% 2 == 0) {
print("x is divisible by 2")
} else {
print("x is neither divisible by 2 nor 3")
}
## [1] "x is neither divisible by 2 nor 3"
```

11 being neither divisible by 2 nor 3, it is the last phrase that will be printed. Note that the second condition is written as `else if`

whereas the last condition is simply `else`

because this is the only remaining option.^{3}

### 13.3.2 ifelse

There is another command similar to `if-else`

, maybe a bit more complicated to understand at first, but much faster, called `ifelse`

.

**Syntax:**

`ifelse(test, true_value, false_value)`

For example, consider the example that if `a > b`

then `c`

should be equal to 20, else `c`

should be equal to 10. One can achieve the same result as in `if-else`

by writing the following code:

```
a = 2
b = 3
c <- ifelse(a > b, 20, 10)
c
## [1] 10
```

The above command means exactly the same as previously: `if (a > b)`

then the result is 20, `else`

this result is 10. Then the result is saved in variable `c`

and you need to call to see the assigned value. So why not just use `ifelse`

? This is because the whole `if ... else`

construct is much more general than `ifelse`

. `ifelse`

only works for assigning values conditionally, but that’s not always what we want to do.

The following example creates a dummy variable from a randomly generated sample data:

```
x <- sample(10, 15, replace = TRUE)
D <- ifelse(x<5, 1, 0)
D
## [1] 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0
```

In the following example, the condition involves putting two conditions:

```
x <- 1:10
ifelse(x<5 | x>8, x, 0)
## [1] 1 2 3 4 0 0 0 0 9 10
```

### 13.3.3 For loops

For loops make it possible to repeat a set of instructions for a certain number of times.

**Syntax:**

```
for(variable in sequence) {
statements to be executed
}
```

For example, try the following:

```
for (i in 1:4){
print("UNCC")
}
## [1] "UNCC"
## [1] "UNCC"
## [1] "UNCC"
## [1] "UNCC"
```

Note that in the example above the index `i`

hasn’t been used in the `statements to be executed`

. In the next example, the indices are operational.

**Example:** Compute the sum of the first 100 integers

```
temporarySum <- 0
for (i in 1:100){
temporarySum <- temporarySum + i
}
print(temporarySum)
## [1] 5050
```

What happened in that loop? First, we defined a variable called `temporarySum`

and set it to 0. Then, when the loops starts, `i`

equals 1, so we add `temporarySum`

to 1, which is 1. Then, `i`

equals 2, and again, we add `temporarySum`

to i. But this time, `temporarySum`

equals 1 and `i`

equals 2, so now `temporarySum`

equals 3, and we repeat this until `i`

equals 100.

**Example:** Compute the factorial \(n!\):

```
n <- 5
result <- 1
for (i in 1:n){
result <- result * i
}
print(result)
## [1] 120
```

**Example:** Generate a data vector

```
x <- 1:8
z <- NULL
for(i in 1:length(x)) {
if(x[i] < 5) {
z <- c(z, x[i] + 10)
} else {
z <- c(z, x[i] - 10)
}
}
z
## [1] 11 12 13 14 -5 -4 -3 -2
```

Her note that `z`

starts as `NULL`

then at each iteration its size increases by one. Also, instead of writing `i in 1:length(x)`

to indicate the values that the index `i`

will assume, we could also use “`i in seq(along=x)`

”, in which case the would be the following:

```
x <- 1:8
z <- NULL
for(i in seq(along=x)) {
if(x[i] < 5) {
z <- c(z, x[i] + 10)
} else {
z <- c(z, x[i] - 10)
}
}
z
## [1] 11 12 13 14 -5 -4 -3 -2
```

If we want to stop on a condition and print an error message:

```
x <- 1:10
z <- NULL
for(i in seq(along=x)) {
if (x[i] < 5) {
z <- c(z,x[i]-1)
} else {
stop("values need to be <5")
}
}
z
```

**Example.** Create a function that returns \(n\)th Fibonacci number. Recall that Fibonacci numbers follow from the recursion: \(x_{n} = x_{n-1} + x_{n-2}\) for \(n > 3\) and \(x_{1}=x_{2}=1\).

```
n <- 8 # this code returns 8th Fibonacci number
a <- 0
b <- 1
for (i in 1:n){
temp <- b
b <- a
a <- a + temp
}
print(a)
## [1] 21
```

Inside the loop, we defined a variable called `temp`

. Defining temporary variables is usually very useful inside loops. Let’s try to understand what happens inside this loop:

- First, we assign the value 0 to variable
`a`

and value 1 to variable`b`

. - We start a loop, that goes from 1 to n.
- We assign the value inside of
`b`

to a temporary variable, called`temp`

. `b`

becomes`a`

.- We assign the sum of
`a`

and`temp`

to`a`

. - When the loop is finished, we return
`a`

.

What happens if we want the 3rd Fibonacci number? At `i = 1`

we have first `a = 0`

and `b = 1`

, then `temp = 1`

, `b = 0`

and `a = 0 + 1`

. Then `i = 2`

. Now `b = 0`

and `temp = 0`

. The previous result, `a = 0 + 1`

is now assigned to `b`

, so `b = 1`

. Then, `a = 1 + 0`

. Finally, `i = 3`

. `temp = 1`

(because `b = 1`

), the previous result `a = 1`

is assigned to `b`

and finally, `a = 1 + 1`

. So the third Fibonacci number equals 2. Reading this might be a bit confusing; I strongly advise you to run the algorithm on a sheet of paper, step by step.

**Example:** Create the first 10 Fibonacci numbers 1,1,2,3,5,8,13,21,34,55

```
Fibo <- numeric(10) # sets up a numeric vector of length 10
for(i in 1:10) {
if(i <= 2){
Fibo[i] = 1
} else{
Fibo[i] = Fibo[i-1] + Fibo[i-2]
}
}
Fibo
## [1] 1 1 2 3 5 8 13 21 34 55
```

### 13.3.4 While loops

`while`

loops are very similar to `for`

loops. The instructions inside a `while`

loop are repeated while a certain condition holds true, and how many repetitions will take place is *not known* in advance.

**Syntax:**

```
while(condition) {
statements to be executed
}
```

A simple examples:

```
z <- 0
while(z < 5) {
z <- z + 2
print(z)
}
## [1] 2
## [1] 4
## [1] 6
```

Let’s consider the sum of the first 100 integers again using a `while`

loop:

```
result = 0
i=1
while (i<=100){
result <- result + i
i <- i + 1
}
print(result)
## [1] 5050
```

Here, we first set result and `i`

to 0 and 1, respectively. Then, while `i`

is less than or equal to 100, we add `i`

to `result`

. Notice that there is a line more than in the for loop: we need to increment the value of `i`

, if not, `i`

would stay equal to 1, and the condition would always be fulfilled, and the program would run forever (not really, only until your computer runs out of memory).

**Example:** The set of all Fibonacci numbers less than a certain number \(N\):

```
FiboN <- c(1,1) # we know that 1, 1 are the first two Fibonacci numbers
N = 50
i = 3
while (FiboN[i-1] + FiboN[i-2] <= N) {
FiboN <- c(FiboN, FiboN[i-1]+FiboN[i-2])
i = i + 1
}
FiboN
## [1] 1 1 2 3 5 8 13 21 34
```

### 13.3.5 Apply Loop Family

There are some other commonly used loops that I don’t mention at this point: `apply()`

, `lappy()`

, `sapply()`

. For simple conditions these functions might be faster than the ones we studied so far.

## 13.4 Declaring functions in `R`

One of the goals of a computer is to alleviate you from doing repetitive and boring tasks. In the previous section, we have seen how we can use loops to let the computer repeat hundreds of instructions for us. In this section, we will discover another object, called **functions**. In programming languages such as `R`

, functions have the same meaning as in mathematics: a function takes one or several *argument(s)* and *return a value*. For example, you should be familiar with the function \(f(x) = \sqrt{x}\) that returns the square root of \(x\). Of course, this function is already pre-programmed inside `R`

. Try the following:

```
sqrt(4)
## [1] 2
```

R should give you the result, 2. But very often, it is useful to define your own functions and this is what we are going to learn in this section.

Suppose you want to create the following function: \(f(x)= 1/\sqrt{x}\). This is the syntax you would use:

```
myFunction <- function(x){
return(1/sqrt(x))
}
```

It is always a good idea to add some comments that explain what the function does:

```
# This function takes one argument, x,
# and returns the inverse of its square root.
myFunction <- function(x){
return(1/sqrt(x))
}
```

Always give your function very explicit names! In mathematics it is standard to give functions just one letter as a name. Never do that in programming! Functions are very flexible. But remember that you can only return one value, or one object. Sometimes you may need to return more that one value. To be able to do this, you must put your values in a *list*, and return the list of values. You can put a lot of instructions inside a function, such as loops.

A simple function that adds one to the number entered as the function argument:

```
addOne <- function(a){ # a is the function argument
return(a + 1)
}
```

Now, we created a function `addOne`

and for example let’s try out

```
addOne(3)
## [1] 4
```

**Example.** Compute the skewness of a given vector `x`

:

```
mySkew <- function(x) {
m3 <- mean((x-mean(x))^3) # mean is already a function in R
skew <- m3/(sd(x)^3) # std dev in R is given by sd
return(skew) # typical output of a function
} # functions begin and end with curly braces
```

Now try out:

```
x <- rnorm(100, 5, 2)
mySkew(x)
## [1] -0.04382396
```

**Example.** Compute the skewness of a given vector `x`

:

```
myMoments <- function(x) {
# this functions returns the first three moments for a vector of data
m1 <- mean(x) # mean is already a function in R
m2 <- (sum((x-mean(x))^2))/(length(x) - 1) # variance
m3 <- mean((x-mean(x))^3)
skew <- m3/(sd(x)^3) # skewness
result = c(m1, m2, m3) # collect in a vector
return(result) # return the vector
}
```

Now try out:

```
x <- rnorm(100, 5, 2)
myMoments(x)
## [1] 4.963137 4.236104 -2.982425
```

**Example.** You can pass parameters with functions:

```
myFun <- function(x, ALPHA, B=0){
result <- sin(x[1]) - cos(x[2] - ALPHA) + x[3]^2 + B
return(result)
}
myFun(c(2, 4, 1), ALPHA = 3, B = 2)
## [1] 3.368995
```

Now try out these:

```
myFun(c(2,4,1), ALPHA = 1) # computes for the default value of B, B = 0
## [1] 2.89929
myFun(c(2,4,1), ALPHA = 1, B=2) # computes for B = 2
## [1] 4.89929
```

**Example.** Create a function that returns \(n\)th Fibonacci number with *iterative* and *recursive* algorithms.

This is the loop that we created above, here we just put it into a function:

```
Fibo <- function(n){
a <- 0
b <- 1
for (i in 1:n){
temp <- b
b <- a
a <- a + temp
}
return(a)
}
```

The above algorithm is called an *iterative algorithm*, because it uses a loop to compute the result, which is based on successive iteration. Let’s look at another way to think about the problem, with a *recursive algorithm*:

```
FiboRecur <- function(n){
if (n == 0 | n == 1){
return(n)} else {
return(FiboRecur(n-1) + FiboRecur(n-2))
}
}
FiboRecur(7)
## [1] 13
```

This algorithm should be easier to understand: if `n = 0`

or `n = 1`

the function should return `n`

(0 or 1). If `n`

is strictly bigger than 1, `FiboRecur`

should return the sum of `FiboRecur(n-1)`

and `FiboRecur(n-2)`

. This version of the function is very much the same as the mathematical definition of the Fibonacci sequence. So why not use only recursive algorithms then?

Try to run the following:

```
system.time(Fibo(30))
## user system elapsed
## 0.004 0.000 0.004
```

The result should be printed very fast (the `system.time()`

function returns the time that it took to execute `Fibo(30)`

). Let’s try with the recursive version:

```
system.time(FiboRecur(30))
## user system elapsed
## 1.441 0.011 1.452
```

It takes much longer to execute! Recursive algorithms are very CPU demanding, so if speed is critical, it’s best to avoid recursive algorithms. Also, in `FiboRecur`

try to remove this: `if (n == 0 | n == 1)`

and try to run `FiboRecur(5)`

for example and see what happens. You should get an error: this is because for recursive algorithms you need a stopping condition, or else, it would run forever. This is not the case for iterative algorithms, because the stopping condition is the last step of the loop.

**Example.** A function that generates a scatterplot

```
corr_plot <- function(x, y, plotit) {
if (plotit == TRUE) {
plot(x, y)
cor(x, y)
}else{cor(x, y)
}
}
```

Now try these two:

```
corr_plot(rnorm(10), rnorm(10), FALSE)
## [1] -0.03353916
corr_plot(rnorm(10), rnorm(10), TRUE)
```

`## [1] 0.2263416`

There are three control utilities for functions, `return`

, `warning`

and `stop`

:

`return`

. The evaluation flow of a function may be terminated at any stage with the return function. This is often used in combination with conditional evaluations.`stop`

. To stop the action of a function and print an error message, one can use the stop function.`warning`

. To print a warning message in unexpected situations without aborting the evaluation flow of a function, one can use the function warning(“…”).

**Example.** Function with a `warning`

message

```
myfct <- function(x1) {
if (x1>=0) {
print(x1)
} else {
stop("This function did not finish, because x1 < 0")
}
warning("Value needs to be > 0")
}
```

Now try out the following

```
myfct(x1 = 2)
## [1] 2
## Warning in myfct(x1 = 2): Value needs to be > 0
```

and

`myfct(x1 = -2)`

## 13.5 Examples

This section contains a few interesting examples.

**Example (Finding Primes)** Suppose we want to find the prime numbers up to a given integer \(n\). A simple algorithm is the following

Create a vector \(\mathbf{x} = (2, \ldots,n)\)

Eliminate all multiples of 2 from \(\mathbf{x}\) which are larger than 2

Move on to the next integer, say \(x_2\), in remaining in the new \(\mathbf{x}\) vector and eliminate all multiples of \(x_2\) which are greater than \(x_2\) from \(\mathbf{x}\)

Continue until all the numbers in \(\mathbf{x}\) is exhausted

The following function implements this algorithm:

```
primeFinder <- function(n) {
if (n >= 2) {
x <- seq(2, n) #contains all the candidates to be tested
primes <- NULL #starts with NULL, but will contain all primes less than n
for (i in seq(2, n)) {
if (any(x == i)) {
primes <- c(primes, i)
x <- c(x[(x %% i) != 0], i)
}
}
return(primes)
} else {
stop("Input value of n should be at least 2.")
}
}
```

Now try out

```
primeFinder(100)
## [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83
## [24] 89 97
```

**Example (Newton’s method of root finding)**

We want to find roots of a differentiable function \(f(x)\), i.e. \[ f(x)=0 \implies x=? \] The idea behind Newton’s method of root finding is based on Taylor approximation of the function \(f(x)\). Suppose that \(x_0\) is an initial candidate we want to test whether it satisfies \(f(x_0) = 0\) and if it does not satisfies the condition (in general it won’t) we want an algorithm that will go to a second point to test and continue until the condition is satisfied. Let’s take a first order Taylor approximation around the initial guess \(x_0\): \[ f(x) = f(x_0) + (x-x_0)f'(x_0) +\frac{1}{2} (x-x_0)^2f''(x_0^*), \] where \(x_0^* = \lambda x + (1-\lambda)x_0\) for some \(\lambda \in [0,1]\). Now assuming that \(|x-x_0|\) is small, the term \((x-x_0)^2\) is negligible relative to the term \((x-x_0)\). So, provided, \(f''\) is bounded, we may write \[ f(x) \approx f(x_0) + (x-x_0)f'(x_0) \]

which suggests that if \(f(x)=0\) is satisfied then \(x\) can be solved as

\[ x = x_0 - \frac{f(x_0)}{f'(x_0)} \]

Let’s call this point \(x_1\) and put back into the function (not into its first order approximation above) \(f(x)\) to test if \(f(x_1)=0\). If \(f(x_1)\neq 0\), then take Taylor approximation around \(x_1\) to obtain another point \(x_2\) to test if \(f(x_2)=0\). Continue until \(f(x_n)=0\), for some \(n\).^{4}

As an example, suppose \(f(x) = x^3 +2x-7\) and take the initial point as \(x_0 = 1\).

```
x <- 1
fNewton <- x^3 + 2*x^2 - 7
tolerance <- 0.000001
while (abs(fNewton) > tolerance) {
fNewtonPrime <- 3*x^2 + 4*x # you need to derive f'(x) by hand
x <- x - fNewton / fNewtonPrime # get the updated point
fNewton <- x^3 + 2*x^2 - 7 # evaluate f at the updated point
}
x
## [1] 1.428818
```

Suppose that we have two differentiable functions \(g\) and \(h\) and we would like to find the points that they are equal. We can use Newton’s method to solve this problem. Just define \[ f(x) = g(x) - h(x) \] and note that \(g(x) = h(x) \iff f(x)=0\). Now apply Newton’s method to \(f(x)\).