Chapter 14 Publishing with R Markdown

14.1 Introduction

With R Markdown you can make your analysis reproducible and conveniently communicate your results with other people. To report the findings of your analysis, typically you would copy-paste the results from R console into Latex (or Word) and, if necessary, you would attach the R codes to the document. Through R Markdown, R offers a much better solution to communicating your results. R Markdown allows you write reports that include both R codes and the output generated. Moreover, these reports are dynamic in the sense that changing the data and reprocessing the file will result in a new report with updated output. R Markdown also lets you include Latex math, hyperlinks and images. These dynamic reports can be saved as

  • PDF or PostScript documents
  • Web pages
  • Microsoft Word documents
  • Open Document files
  • and more like Beamer slides, etc.

The goal of this document is to explain the most essential features of R Markdown using a template approach. After reading this document, you should be able to write nicely formatted reports. You will be required to turn in your work for this course in R Markdown (homeworks, paper replications, term project, etc).

14.1.0.1 Necessary Installations

Before continuing let’s make sure that you have necessary packages installed. Here are the basic steps to check whether you are set for writing R markdown documents:

  • If you are NOT using R Studio:

    • install the rmarkdown package:

      install.packages("rmarkdown")

      This will install several other packages including knitr that you will need for rendering your R Markdown file.

    • install Pandoc (http://johnmacfarlane.net/pandoc/index.html). Pandoc is a free application available for Windows, Mac OS X, and Linux. It converts files from one markup format to another.

  • If you are using R Studio, you can skip the previous step because you already have the necessary packages (R Markdown and Pandoc) installed in R Studio.

  • When you render an R Markdown file, it will appear, by default, as an HTML document in Viewer window of R Studio. If you want to create PDF documents, install a LaTeX compiler. Install MacTeX for Macs (http:// tug.org/mactex), MiKTeX (www.miktex.org) for Windows, and TeX Live for Linux (www.tug.org/texlive). Alternatively, you can install TinyTeX from https://yihui.name/tinytex/.

  • Install the xtable package. The xtable() function in this package attractively formats data frames and matrices to include in reports. xtable() can also format objects produced by the lm(), glm(), aov(), table(), ts(), and coxph() functions. After loading the package, use methods(xtable) to view a comprehensive list of the objects it can format.

14.2 Basic Structure of R Markdown

Let’s start with a simple R Markdown file and see what it looks like and the output that it produces when executed.

R markdown files ends with the .Rmd extension. Here is a simple template for writing writing your homework. The file is called hwTemplate.Rmd and that is how this file looks as an R markdown file:

=================================================

---
title: "Homework 1"
output:
  html_document: 
    toc: yes
  pdf_document: 
date: '2018-02-15'
---

## Problem 1

For this problem  I create an artifical data frame:
```{r}
myDataFrame <- data.frame(names = LETTERS[1:3], variable_1 = runif(3))
myDataFrame
```

## Problem 2 

For this problem  I regress `weight` on `height` and save the regression 
output as an object called `myRegression`:

```{r}
myRegression <- lm(weight ~ height, data = women)
```

## Problem 3

Here I plot a simple histogram:
```{r}
myData <- rnorm(100)
hist(myData)
```

## Problem 4

For this problem I need to prove something so I need to typset math:

\[ \mu_x = \sum_{i=1}^{n} x_{i} P(X=x_{i}) \]

The document ends here.

=================================================

14.2.0.1 Contents of .Rmd files

As we saw above an .Rmd file contains three types of contents:

  1. A YAML header :

    ---
    title: "Homework 1"
    output:
      html_document: default
    date: '2018-02-15'
    ---

    YAML stands for “yet another markup language” (https://en.wikipedia.org/wiki/YAML).

  2. R code chuncks. For example:

```{r}
myDataFrame <- data.frame(names = LETTERS[1:3], variable_1 = runif(3))
myDataFrame
```
  1. Text with formatting like bold text, mathematical expressions (\(\sum_{i=1}^{n} x_{i}\)), or headings # Heading, etc.

We are going to see the details of each of these components below. But first let’s see how we can execute an .Rmd file to produce the output as PDF, HTML, etc.

14.2.1 Producing (Rendering) R Markdown Reports

To open a new R Markdown file, a file with extension .Rmd, in the menubar select File > New File > R Markdown. There are some options, but you can add them later, so just click OK at this stage.

Now click Knit to produce a complete report containing all text, code, and results. Alternatively, pressing Cmd + Shift + K (or Ctrl + Shift + K) renders the whole document. But in this case, all output formats that are specified in the YAML header will be produced. On the other hand, Knit allows you to specify the output format you want to produce. For example, Knit > Knit to HTML produces only HTML output, which is usually faster than producing PDF output.

You can also render the file programmatically with the following command:

rmarkdown::render("hwTemplate.Rmd") 

This will display the report in the viewer pane, and create a self-contained HTML file.

Instead of running the whole document, you can run each individudal code chunk by clicking the Run icon at the top right of the chunk or by pressing Cmd + Shift + Enter (or Ctrl + Shift + Enter). RStudio executes the code and displays the results inline with the code.

14.2.2 How it works?

When you knit the document, R Markdown sends the .Rmd file to knitr, (http://yihui.name/knitr/), which executes all of the code chunks and creates a new markdown (.md) document which includes the code and its output. The markdown file generated by knitr is then processed by pandoc, (http://pandoc.org/), which is responsible for creating the finished file. The advantage of this two step workflow is that you can create a very wide range of output formats, as you’ll learn about in R markdown formats.

14.3 Text Formatting with R Markdown

This section demonstrates the syntax of common components of a document written in R Markdown. The approach is based on Pandoc, so we start with the syntax of Pandoc’s flavor of Markdown.

14.3.1 Markdown syntax

In this section, we give a very brief introduction to Pandoc’s Markdown. The comprehensive syntax of Pandoc’s Markdown can be found on the Pandoc website http://pandoc.org.

14.3.2 Inline formatting

  • Some common formatting
Function Syntax Output
Italik *text* or_text_ text
Bold **text** or __text__ text
Subindex X~i~ Xi
Subscript H~2~SO~4~ H2SO4
Superscript Fe^2+^ Fe2+
Mathmode $X_{i}$, $x^2$ \(X_i\), \(x^2\)

strikes text Does it?

  • Inline Code: To mark text as inline code, use a pair of backticks, e.g., `code`. To include literal backticks, use more backticks outside, e.g., you can use two backticks to preserve one backtick inside: `` `code` ``.

  • Links: Links are created using the syntax [link name](link address). For example [RStudio](https://www.rstudio.com) gives RStudio

  • Images: The syntax for images is similar: ![image title](path/to/image). Don’t forget to add an exclamation mark!

  • Footnotes: Footnotes are created by putting inside the square brackets after a caret ^[]. For example ^[This is a footnote.].

14.3.3 Block-level elements

Section headers can be written after a number of pound signs, e.g.,

# First-level header

## Second-level header

### Third-level header

Unordered list items start with *, -, or +, and you can nest one list within another list by indenting the sub-list by four spaces, e.g.,

- one item
- one item
- one item
    - one item
    - one item

The output is:

  • one item
  • one item
  • one item
    • one item
    • one item

Ordered list items start with numbers (the rule for nested lists is the same as above), e.g.,

1. the first item
2. the second item
3. the third item

The output does not look too much different with the Markdown source:

  1. the first item
  2. the second item
  3. the third item

Blockquotes are written after >, e.g.,

> "Imagination is more important than knowledge."
>
> --- Albert Einstein

The actual output:

“Imagination is more important than knowledge.”

— Albert Einstein

Plain code blocks can be written after three or more backticks, and you can also indent the blocks by four spaces, e.g.,

```
This text is displayed verbatim / preformatted
rnorm(2)
```

produces

This text is displayed verbatim / preformatted
rnorm(2)

Or indent by four spaces. For example

    This text is displayed verbatim / preformatted
    rnorm(2)
    x <- c(1,-4)
    x

produces

This text is displayed verbatim / preformatted
rnorm(2)
x <- c(1,-4)
x

14.3.4 Tables with markdown

A simple table:

  __Function__  |   __Syntax__   | __Output__           
----------------|----------------|------------
Italik          | `*text*`       | *text*          
Bold            |`**text**`      | **text**

produces

Function Syntax Output
Italik *text* text
Bold **text** text

The default is left alignment, but this can be modified.

Right-aligned Table

  __Function__  |   __Syntax__   | __Output__           
---------------:|---------------:|------------:
Italik          | `*text*`       | *text*         
Bold            |`**text**`      | **text**

renders

Function Syntax Output
Italik *text* text
Bold **text** text

Right, Left and Center Aligned Tables

  __Left__      |   __Center__   | __Right__           
:---------------|  :----------:  |-----------:
Italik          | `*text*`       | *text*          
Bold            |`**text**`      | **text**

produces

Left Center Right
Italik *text* text
Bold **text** text

14.3.5 Mathematical expressions

Inline LaTeX equations can be written in a pair of dollar signs using the LaTeX syntax. For example, $\mu = \sum_{i=1}^{n} x_{i} P(X=x_{i})$ produces \(\mu = \sum_{i=1}^{n} x_{i} P(X=x_{i})\). Math expressions of the display style can be written in a pair of double dollar signs ($$). The same example in display mode:

$$ \mu = \sum_{i=1}^{n} x_{i} P(X=x_{i}) $$

produces \[ \mu = \sum_{i=1}^{n} x_{i} P(X=x_{i}) \] You can also use math environments inside $ $ or $$ $$. Some examples are given below.

An array:

$$
\begin{array}{ccc}
x_{11} & x_{12} & x_{13}\\
x_{21} & x_{22} & x_{23}
\end{array}
$$

\[\begin{array}{ccc} x_{11} & x_{12} & x_{13}\\ x_{21} & x_{22} & x_{23} \end{array}\]

Data matrix with two observations:

$$
X =
\begin{bmatrix}
1 & x_{11} & x_{12} & x_{13}\\
1 & x_{21} & x_{22} & x_{23}
\end{bmatrix}
$$

\[ X = \begin{bmatrix} 1 & x_{11} & x_{12} & x_{13}\\ 1 & x_{21} & x_{22} & x_{23} \end{bmatrix} \] A matrix of parameters:

$$
\Theta = 
\begin{pmatrix}
\alpha & \beta\\
\gamma & \delta
\end{pmatrix}
$$

\[\Theta = \begin{pmatrix}\alpha & \beta\\ \gamma & \delta \end{pmatrix}\]

Determinant:

$$
\begin{vmatrix}
a & b\\
c & d
\end{vmatrix}
= ad-bc
$$

\[\begin{vmatrix}a & b\\ c & d \end{vmatrix}=ad-bc\]

Below is an align environment example:

\begin{align} 
\mu_{X|y^*} &= E[X|Y=y^*] \\
            &= \sum\limits_{i=1}^{n} x_i\cdot P(X=x_i|Y=y^*) 
\end{align} 

When inserting numbers into text, format() is your friend. It allows you to set the number of digits so you don’t print to a ridiculous degree of accuracy, and a big.mark to make numbers easier to read. I’ll often combine these into a helper function:

comma <- function(x) format(x, digits = 2, big.mark = ",")
comma(3452345)
## [1] "3,452,345"
comma(.12358124331)
## [1] "0.12"

You can also define your own commands as in Latex. This is especially useful if there are certain mathematical expressions that you type quite often. For example, the following command defines a math command which takes two parameters. Let’s see how does it work (if you are not familiar with redefining commands in Latex, skip this example). $$ \newcommand{\mysum}[2]{\sum_{i=#1}^{#2}}$$

Now typing $\mysum{1}{N}$ produces \(\sum_{i=1}^{N}\).

14.4 Code Chunks

14.4.1 Inserting and Executing Code Chunks

There are two types of R code in R Markdown documents:

  • Inline R code: The syntax for this is `r r-code`, and it can be embedded inline with other document elements. For example, two + three = `r 2+3` gives two + three =5.

You can also use inline R code to interact with R objects. For example, consider women dataset in base package. It contains the height information of 15 women. In R markdown file the previous sentence appears as

It contains the height information of `r length(women$height)` women.
  • R code chunks: R code chunks look like plain code blocks, but have {r} after the three backticks and (optionally) chunk options inside {}. For example, a chunk that plots 10 random draws from a normal distribution and obtains a histogram:
```{r chunk-label, echo = FALSE, fig.cap = 'A figure caption.'}
x <- rnorm(10)  # 10 random numbers
hist(x)  # a histogram
```

There are three ways to insert a chunk:

  1. The keyboard shortcut Cmd (or Ctrl) + Alt + I. (Use this!)

  2. By manually typing the chunk delimiters ```{r} and ```.

  3. The “Insert” button icon in the editor toolbar.

A simple code chunk:

x <- 2:5

Note that it produces the output right below. But as we shall see you can control whether you want the output be produced or not.

14.4.2 Chunk Options

Chunk output can be customised with options, arguments supplied to chunk header.

The most important set of options controls if your code block is executed and what results are inserted in the finished report:

  • eval = FALSE prevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.

  • include = FALSE runs the code, but doesn’t show the code or results in the final document. Use this for setup code that you don’t want cluttering your report.

  • echo = FALSE prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlying R code.

  • message = FALSE or warning = FALSE prevents messages or warnings from appearing in the finished file. This is useful when you install packages and don’t want messages or warnings to appear in the report.

  • results = 'hide' hides printed output; fig.show = 'hide' hides plots.

  • error = TRUE causes the render to continue even if code returns an error. This is rarely something you’ll want to include in the final version of your report, but can be very useful if you need to debug exactly what is going on inside your .Rmd. It’s also useful if you’re teaching R and want to deliberately include an error. The default, error = FALSE causes knitting to fail if there is a single error in the document.

The following table summarises which types of output each option supressess:

Option Run code Show code Output Plots Messages Warnings
eval = FALSE no no no no no
include = FALSE no no no no no
echo = FALSE no
results = "hide" no
fig.show = "hide" no
message = FALSE no
warning = FALSE no

There are some other options that you might find useful:

  • Chunks can be given an optional name: ```{r chunk-name}.

  • collapse = FALSE (applies to Markdown output only) whether to, if possible, collapse all the source and output blocks from one code chunk into a single block (by default, they are written to separate
    
    
    blocks). 
  • highlight = TRUE: whether to highlight the source code (it is FALSE by default if the output is Sweave or listings)

  • size = 'normalsize': font size for the default LaTeX output (see?highlight` in the highlight package for a list of possible values)

  • cache = FALSE: If certain code chunks are time consuming to render, you may cache them by adding the chunk option cache = TRUE. This option tells R to use ouput

  • background = '#F7F7F7': character or numeric) background color of chunks in LaTeX output (passed to the LaTeX package framed); the color model is rgb; it can be either a numeric vector of length 3, with each element between 0 and 1 to denote red, green and blue, or any built-in color in R like red or springgreen3 (see colors() for a full list), or a hex string like #FFFF00, or an integer (all these colors will be converted to the RGB model; see ?col2rgb for details)

  • Knitr provides almost 60 options that you can use to customize your code chunks. Here we’ll cover the most important chunk options that you’ll use frequently. You can see the full list at http://yihui.name/knitr/options/.

14.4.3 Tables with Packages

By default, R Markdown prints data frames and matrices as you’d see them in the console:

mtcars[1:5, 1:6]
##                    mpg cyl disp  hp drat    wt
## Mazda RX4         21.0   6  160 110 3.90 2.620
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875
## Datsun 710        22.8   4  108  93 3.85 2.320
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215
## Hornet Sportabout 18.7   8  360 175 3.15 3.440

If you prefer that data be displayed with additional formatting you can use the knitr::kable function. The code below

```{r kable-example}
knitr::kable(mtcars[1:5, 1:6], 
caption = 'A knitr  table example')
```

generates

TABLE 14.1: A knitr table example
mpg cyl disp hp drat wt
Mazda RX4 21.0 6 160 110 3.90 2.620
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875
Datsun 710 22.8 4 108 93 3.85 2.320
Hornet 4 Drive 21.4 6 258 110 3.08 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440

Read the documentation for ?knitr::kable to see the other ways in which you can customise the table. For even deeper customisation, consider the xtable, stargazer, pander, tables, and ascii packages. Each provides a set of tools for returning formatted tables from R code.

If you want to put multiple tables in a single table environment, wrap the data objects (usually data frames in R) into a list. See 14.2 for an example.

knitr::kable(
  list(
    head(iris[, 1:2], 3),
    head(mtcars[, 1:3], 5)
  ),
  caption = 'A Tale of Two Tables.', booktabs = TRUE
)
TABLE 14.2: A Tale of Two Tables.
Sepal.Length Sepal.Width
5.1 3.5
4.9 3.0
4.7 3.2
mpg cyl disp
Mazda RX4 21.0 6 160
Mazda RX4 Wag 21.0 6 160
Datsun 710 22.8 4 108
Hornet 4 Drive 21.4 6 258
Hornet Sportabout 18.7 8 360

This feature is only available in HTML and PDF output.

You can use kable() function to format output of various R objects. The following example shows how kable formats output of lm function.

Regular R markdown print-out:

coefficients(summary(lm(height ~ weight, data=women)))
##               Estimate  Std. Error  t value     Pr(>|t|)
## (Intercept) 25.7234557 1.043746325 24.64531 2.684784e-12
## weight       0.2872492 0.007588083 37.85531 1.090973e-14

And this is the output formatted by kable

library(knitr)
kable(coefficients(summary(lm(height ~ weight, data=women))))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.7234557 1.0437463 24.64531 0
weight 0.2872492 0.0075881 37.85531 0

There is also a rich set of options for controlling how figures are embedded that we will learn about next.

14.4.4 Figures Formatting

The chunk option fig.asp can be used to set the aspect ratio of plots, i.e., the ratio of figure height/width. If the figure width is 6 inches (fig.width = 6) and fig.asp = 0.7, the figure height will be automatically calculated from fig.width * fig.asp = 6 * 0.7 = 4.2. Figure 14.1 is an example using the chunk options fig.asp = 0.7, fig.width = 6, and fig.align = 'center', generated from the code below:

```{r pressure-plot, fig.asp=.7, fig.width=6, fig.cap='A figure example
with the specified aspect ratio, width, and alignment.', fig.align='center'
out.width='90%'}
par(mar = c(4, 4, .1, .1))
plot(pressure, pch = 19, type = 'b')
```
produces
A figure example with the specified aspect ratio, width, and alignment.

FIGURE 14.1: A figure example with the specified aspect ratio, width, and alignment.

The actual size of a plot is determined by the chunk options fig.width and fig.height (the size of the plot generated from a graphical device), and we can specify the output size of plots via the chunk options out.width and out.height. The possible value of these two options depends on the output format of the document. For example, out.width = '30%' is a valid value for HTML output, but not for LaTeX/PDF output. However, knitr will automatically convert a percentage value for out.width of the form x% to (x / 100) \linewidth, e.g., out.width = '70%' will be treated as .7\linewidth when the output format is LaTeX. This makes it possible to specify a relative width of a plot in a consistent manner. 14.2 is an example of out.width = 70%.

```{r cars-plot, out.width='70%', fig.cap='A figure example with
a relative width 70\%.'}
par(mar = c(4, 4, .1, .1))
plot(cars, pch = 19)
```
produces
A figure example with a relative width 70\%.

FIGURE 14.2: A figure example with a relative width 70%.

If you want to put multiple plots in one figure environment, you must use the chunk option fig.show = 'hold' to hold multiple plots from a code chunk and include them in one environment. You can also place plots side by side if the sum of the width of all plots is smaller than or equal to the current line width. For example, if two plots have the same width 50%, they will be placed side by side. Similarly, you can specify out.width = '33%' to arrange three plots on one line. 14.3 is an example of two plots, each with a width of 50%.

```{r multi-plots, out.width='50%', fig.show='hold', fig.align='center',
fig.cap='Two plots placed side by side.'}
par(mar = c(4, 4, .1, .1))
plot(pressure, pch = 19, type = 'b')
plot(cars, pch = 19)
```
Two plots placed side by side.Two plots placed side by side.

FIGURE 14.3: Two plots placed side by side.

Sometimes you may have certain images that are not generated from R code, and you can include them in R Markdown via the function knitr::include_graphics(). The following is an example of three R logos included in a figure environment. You may pass one or multiple image paths to the include_graphics() function, and all chunk options that apply to normal R plots also apply to these images, e.g., you can use out.width = '33%' to set the widths of these images in the output document.

```{r knitr-logo, out.width='32.8%', fig.show='hold', fig.cap='Three R
logos included in the document from an external PNG image file.'}
knitr::include_graphics(rep('images/Rlogo.png', 3))
```
knitr::include_graphics(rep('images/Rlogo.png', 3))
Three R logos included in the document from an external PNG image file.Three R logos included in the document from an external PNG image file.Three R logos included in the document from an external PNG image file.

FIGURE 14.4: Three R logos included in the document from an external PNG image file.

There are a few advantages of using include_graphics():

  1. You do not need to worry about the document output format, e.g., when the output format is LaTeX, you may have to use the LaTeX command \includegraphics{} to include an image, and when the output format is Markdown, you have to use ![](). The function include_graphics() in knitr takes care of these details automatically.
  2. The syntax for controlling the image attributes is the same as when images are generated from R code, e.g., chunk options fig.cap, out.width, and fig.show still have the same meanings.
  3. include_graphics() can be smart enough to use PDF graphics automatically when the output format is LaTeX and the PDF graphics files exist, e.g., an image path foo/bar.png can be automatically replaced with foo/bar.pdf if the latter exists. PDF images often have better qualities than raster images in LaTeX/PDF output. To make use of this feature, set the argument auto_pdf = TRUE, or set the global option options(knitr.graphics.auto_pdf = TRUE) to enable this feature globally in an R session.
  4. You can easily scale these images proportionally using the same ratio. This can be done via the dpi argument (dots per inch), which takes the value from the chunk option dpi by default. If it is a numeric value and the chunk option out.width is not set, the output width of an image will be its actual width (in pixels) divided by dpi, and the unit will be inches. For example, for an image with the size 672 x 480, its output width will be 7 inches (7in) when dpi = 96. This feature requires the package png and/or jpeg to be installed. You can always override the automatic calculation of width in inches by providing a non-NULL value to the chunk option out.width, or use include_graphics(dpi = NA).

14.4.5 Global options

You may be inclined to use largely the same set of chunk options throughout a document. But it would be a pain to retype those options in every chunk. Thus, you want to set some global chunk options at the top of your document. You can do this by calling knitr::opts_chunk$set() in a code chunk. For example, when writing books and tutorials I set:

knitr::opts_chunk$set(
  comment = "#>",
  collapse = TRUE
)

This uses my preferred comment formatting, and ensures that the code and output are kept closely entwined. On the other hand, if you were preparing a report, you might set:

knitr::opts_chunk$set(
  echo = FALSE
)

That will hide the code by default, so only showing the chunks you deliberately choose to show (with echo = TRUE). You might consider setting message = FALSE and warning = FALSE, but that would make it harder to debug problems because you wouldn’t see any messages in the final document.

For example, I might use include=FALSE or at least echo=FALSE globally for a report to a scientific collaborator who wouldn’t want to see all of the code. And I might want something like fig.width=12 and fig.height=6 if I generally want those sizes for my figures.

I’d set such options by having an initial code chunk like this:

```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
echo=FALSE, warning=FALSE, message=FALSE)
```

I snuck a few additional options in there: warning=FALSE and message=FALSE suppress any R warnings or messages from being included in the final document, and fig.path='Figs/' makes it so the figure files get placed in the Figs subdirectory. (By default, they are not saved at all.)

Note: the ending slash in Figs/ is important. If you used fig.path='Figs' then the figures would go in the main directory but with Figs as the initial part of their names.

The global chunk options become the defaults for the rest of the document. Then if you want a particular chunk to have a different behavior, for example, to have a different figure height, you’d specify a different option within that chunk. For example:

```{r a_taller_figure, fig.height=32}
hist(myData)
```

In a report to a collaborator, I might use include=FALSE, echo=FALSE as a global option, and then use include=TRUE for the chunks that produce figures. Then the code would be suppressed throughout, and any output would be suppressed except in the figure chunks (where I used include=TRUE), which would produce just the figures.

Technical aside: In setting the global chunk options with opts_chunk$set(), you’ll need to use knitr:: (or to have first loaded the knitr package with library(knitr)).

14.5 YAML header

You can control many other “whole document” settings by tweaking the parameters of the YAML header. You might wonder what YAML stands for: it’s “yet another markup language”, which is designed for representing hierarchical data in a way that’s easy for humans to read and write. R Markdown uses it to control many details of the output. Here we’ll discuss two: document parameters and bibliographies.

Bibliographies and Citations

Pandoc can automatically generate citations and a bibliography in a number of styles. To use this feature, specify a bibliography file using the bibliography field in your file’s header. The field should contain a path from the directory that contains your .Rmd file to the file that contains the bibliography file:

bibliography: rmarkdown.bib

You can use many common bibliography formats including BibLaTeX, BibTeX, endnote, medline.

To create a citation within your .Rmd file, use a key composed of @ + the citation identifier from the bibliography file. Then place the citation in square brackets. Here are some examples:

@Xie15 produces Xie (2015).

[@Xie15] produces (Xie 2015).

[@Williams10] produces (Williams and Bizup 2010).

[@Xie15; @Williams10] produces (Xie 2015; Williams and Bizup 2010).

[see @Xie15, pp. 100-1; also @Williams10, ch. 1] produces (see Xie 2015, 100–101; also Williams and Bizup 2010, ch. 1)

Remove the square brackets to create an in-text citation: Xie (2015) says blah, or Xie (2015, 33) says blah.

When R Markdown renders your file, it will build and append a bibliography to the end of your document. The bibliography will contain each of the cited references from your bibliography file, but it will not contain a section heading. As a result it is common practice to end your file with a section header for the bibliography, such as # References or # Bibliography.

14.6 Troubleshooting

Troubleshooting R Markdown documents can be challenging because you are no longer in an interactive R environment, and you will need to learn some new tricks. The first thing you should always try is to recreate the problem in an interactive session. Restart R, then “Run all chunks” (either from Code menu, under Run region), or with the keyboard shortcut Ctrl + Alt + R. If you’re lucky, that will recreate the problem, and you can figure out what’s going on interactively.

If that doesn’t help, there must be something different between your interactive environment and the R markdown environment. You’re going to need to systematically explore the options. The most common difference is the working directory: the working directory of an R Markdown is the directory in which it lives. Check the working directory is what you expect by including getwd() in a chunk.

Next, brainstorm all the things that might cause the bug. You’ll need to systematically check that they’re the same in your R session and your R markdown session. The easiest way to do that is to set error = TRUE on the chunk causing the problem, then use print() and str() to check that settings are as you expect.

Also try to avoid using these commands

  • View
  • help
  • attach (but you can use with() function)

14.7 Learning more

  • The best place to stay on top of innovations is the official R Markdown website: http://rmarkdown.rstudio.com.

  • R Markdown is based on the knitr package, developed by Yihui Xie, for integrated R with LaTeX. For full documentation, see http://yihui.name/knitr/, and Xie’s books Dynamic Documents with R and knitr (Xie 2016) and R Markdown: The Definitive Guide (Xie, Allaire, and Grolemund 2018).

There are two important topics that we haven’t covered here: collaboration, and the details of accurately communicating your ideas to other humans. Collaboration is a vital part of modern data science, and you can make your life much easier by using version control tools, like Git and GitHub. We recommend two free resources that will teach you about Git:

  1. “Happy Git with R”: a user friendly introduction to Git and GitHub from R users, by Jenny Bryan. The book is freely available online: http://happygitwithr.com

  2. The “Git and GitHub” chapter of R Packages, by Hadley. You can also read it for free online: http://r-pkgs.had.co.nz/git.html.

I have also not touched on what you should actually write in order to clearly communicate the results of your analysis. To improve your writing, I highly recommend reading either Style: Lessons in Clarity and Grace by Joseph M. Williams & Joseph Bizup, or The Sense of Structure: Writing from the Reader’s Perspective by George Gopen. Both books will help you understand the structure of sentences and paragraphs, and give you the tools to make your writing more clear.

References

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.

Williams, Joseph M, and Joseph Bizup. 2010. Style: Lessons in Clarity and Grace. Vol. 565214475. Longman Boston.

Xie, Yihui. 2016. Dynamic Documents with R and Knitr. Chapman; Hall/CRC.

Xie, Yihui, JJ Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. CRC Press.