1 Reproducible research within R

Reproducible research focuses on answering the following questions:

  • How did you do that?
  • What data did you use?
  • Where is code used?

Essentially one wants to capture the typical workflow of:

  • Preparing data
  • Analyzing the data
  • Report on the findings
  • Then realizing that you did something wrong
  • Do it all over again
The final product of research is not only the paper itself, but also the full 
computation environment used to produce the results in the paper such as the 
code and data necessary for reproduction of the results and building upon 
the research.                                                    (Xie, 2014).

researchpipline

2 Rmarkdown


A convenient tool to generate reproducible document. A R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code. This is a Rmarkdown document:

rmarkdown

Notice that the file contains three types of content:

  • An (optional) YAML header surrounded by ---s
  • R code chunks surrounded by ```s
  • text mixed with simple text formatting

To generate a report from the file, run the render command:

library(rmarkdown)
render("file.Rmd")

Better still, use the “Knit” button in the RStudio IDE to render the file and preview the output with a single click or keyboard shortcut ([ctrl/cmd]+[shift]+K).

Rmarkdown documents can out of the box be rendered as:

  • Html documents (webpages)
  • Word documents
  • PDF document

Exercise

In Rstudio create a new markdown file, play around with different rendering options

3 Working with Markdown


Markdown is designed to be easy to write, and, even more importantly, easy to read:

A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. – John Gruber

Markdown does therefore only provides light text formating commands leaving the page layout to the renderer.

3.1 Paragraph formating

When writing paragraphs you have the options of creating italicised text with encapsulating text with *. This means that is *some text* is rendered as some text. Simliarly bold text is indicated by encapsulating the text with **. Super and subsctripts are indicated with a^2^ (a2) and b~2~ (b2) respectively.

Horizontal rulers are created using either of the following:

***
___

which generates this


3.2 Lists

Bulleted and numbered lists are created using single * so this bit of code:

* Bla bla bla
* ble ble ble

becomes:

  • Bla bla bla
  • ble ble ble

You can create subitems in a list using:

* Bla bla bla
    * indented list item
* ble ble ble

which becomes

  • Bla bla bla
    • indented list item
  • ble ble ble

Numbered lists are simply generated with

1. bla bla bla
2. ble ble ble

which becomes

  1. bla bla bla
  2. ble ble ble

3.3 Headings

Section heading are created with # followed with the section title. Subsections and subsubsections (and subsubsub…) are created similarly with ## and ### etc. So the following bits of text:

# Heading 
## Subheading 
### Subsubheading 
#### Subsubsubheading

3.4 Tables

Tables in Markdown are fairly easily generated using the following code:

First Header  | Second Header
------------- | -------------
Content Cell  | Content Cell
Content Cell  | Content Ce

which results in:

First Header Second Header
Content Cell Content Cell
Content Cell Content Cell

3.5 Images

Figures can be embedded into markdown using the the following command:

![](path/to/figure.png)

3.6 Mathematical

Mathematical formulas can also be rendered via the LaTeX equation environment, indicated by some commands enclosed by $-signs ($$ to present the formula centered in a new line). So as a simple example the following bit of text:

$x^2 + \sum_i a_i y_i^n$

becomes \[x^2 + \sum_i a_i y_i^n\]

which renders as word equation when exported to a word file.

Exercise

In your markdown file investigate how various formating settings will appear in a word file. As an example:

  • Create a header and sub-headings
  • Create a dummy table
  • Create a bulleted list
  • Embed a figure into the document

3.7 Embedding R

One of the key things with Rmarkdown is the ability to embed R code with the document writing

chunk

R code is weaved into the document in using specialised input environments, called chunks:

```{r}
 # your R code here
```
  • In these chunks you can place any type of R command which will be run whenever the document is rendered.
  • The code chunks rememeber the results from previous chunks in the document, so the Rmarkdown file behaves like a regular R-script.
  • Code chunks can be created either by:
    • selecting “Insert chunk” from the “Code” menu in Rstudio or
    • using the keyboard shortcut [ctrl] + [alt] + I.

Tables

Tables can can also be generated from code chunks. The kable function from the knitr package can take any data frame an turn it into a table. The following chunk will produce a table from the tab data frame:

```{r}
#| results: asis
tab <- tibble::tibble(x = 1:3, a = letters[x])
knitr::kable(tab, caption = 'A simple table')
```
A simple table
x a
1 a
2 b
3 c

Figures

Figures are rendered in place:

```{r}
plot(1:9)
```

and you can control the figure width and height using the fig-width and fig-height options:

```{r}
#| fig-height: 3
#| fig-width: 9
plot(1:9)
```

Showing code

By default chunks are rendered such that both the code in the chunk is visible and the output, which is either a figure or printout from the console. We can set the #| echo: false in our code chuck. e.g. this code will result in that the code behind the plot above would not be rendered:

{r #| echo: false #| fig-height: 3 #| fig-width: 9 plot(1:9)

The following is a list of some common options one can set in the chunks:

Options Description
#| eval: true Evaluate all or part of the current chunk
#| echo: true Show all or part of the source code
#| message: true Show all messages from the console
#| warning: true Show all warnings
#| results: 'asis' Writes raw output from R to the output document without markup. Helpful for creating tables with xtable. markup is the default.
#| include: true Code chunk will be included in output. If you don’t want a chunk in the output but still evaluated set this to false
#| fig-width: 8 controls the figure width (in inches)
#| fig-height: 4 controls the figure height (in inches)
#| fig-cap: 'Figure caption' is the figure caption, note captions are only rendered when captions are enabled.
#| fig-align: [ht] Sets the figure alignment in pdf output
#| dev: allows the user to specify the file type of the plot (png, pdf, bmp, etc..)
#| cache: FALSE Should the calculations in this chunk be cached, very useful for heavy calculations

Take note that these could also be specified in the YAML-header (see below).

Exercise

In your markdown file:

  • Create a code chunk that reads in the minke whale dataset
  • Create another chunk that plots the length weight relationship with the appropriate figure caption
  • Create table illustrating the number of males and females by year

4 The YAML header

The YAML (Yet Another Markup Language) header is the top part of the Rmarkdown document. R Markdown uses it to control many details of the output, such as the title and author. Many of the settings in YAML can be set using the options menu in Rstudio. The following table shows a couple of the most commonly set YAML options:

Option Description
author The document author
date Date of the document
title Title of the document
output Can be html_document, pdf_document, word_document
fig_width and fig_height Default figure width, can be overridden with chunk options

However there are two useful additional settings in the YAML header that cannot be set with menu options, bibliographic citations and word style templates.

4.1 Bibliography

Bibliographic references can be added to a markdown file quite easily. To use this feature you need to set the bibliography field in the YAML header and this setting should point to the location of the references. The references are read from a BibTeX-file, a special file format for bibligraphic citations. A typical BibTeX file looks something like this:

@book{darwin2009origin,
  title={The origin of species by means of natural selection: or, the preservation of favored races in the struggle for life},
  author={Darwin, Charles and Bynum, William F},
  year={2009},
  publisher={AL Burt}
}

@article{pope2009honey,
  title={Honey, I cooled the cods: Modelling the effect of temperature on the structure of Boreal/Arctic fish ecosystems},
  author={Pope, John G and Falk-Pedersen, Jannike and Jennings, Simon and Rice, Jake C and Gislason, Henrik and Daan, Niels},
  journal={Deep Sea Research Part II: Topical Studies in Oceanography},
  volume={56},
  number={21},
  pages={2097--2107},
  year={2009},
  publisher={Elsevier}
}

and the YAML header referencing the bibliography could look something like:

---
title: "Example"
output:
  html_document
bibliography: example.bib
---

You can the cite the relevant entries from BibTex file using the @shortname, which gives you an inline reference or [@shortname] to cite within parenthesis (or [@short1; @short2] to cite more than one). So if you want to cite Darwins origin origin of the species do @darwin2009origin which will result in Darwin and Bynum (2009), and within parenthesis [@darwin2009origin; @pope2009honey] or (Darwin and Bynum 2009; Pope et al. 2009).

4.2 Word style templates

If you want the appearance of your word document to mimic another word document you can use the reference_docx option in the YAML header:

---
title: "Test"
output:
  word_document: 
    reference_docx: tmp.docx
---
Exercise

A markdown report

If you have the “git”-program install:

  • Copy this path: https://github.com/einarhjorleifsson/minke.git
  • In RStudio do: File -> New Project … -> Version Control -> Git
  • Paste the path into: Repository URL
  • Open the minke.qmd document

Or do things via point-and-mouse-click taking the zip file from here:

  • https://github.com/einarhjorleifsson/minke
  • Do: Code -> Download Zip -> you know the rest
  • Open this new project

We will collectively go through this markdown document to gain some insight into some common setting used when generating a report.

4.3 Further reading

5 References

Darwin, Charles, and William F Bynum. 2009. The Origin of Species by Means of Natural Selection: Or, the Preservation of Favored Races in the Struggle for Life. AL Burt.
Pope, John G, Jannike Falk-Pedersen, Simon Jennings, Jake C Rice, Henrik Gislason, and Niels Daan. 2009. “Honey, i Cooled the Cods: Modelling the Effect of Temperature on the Structure of Boreal/Arctic Fish Ecosystems.” Deep Sea Research Part II: Topical Studies in Oceanography 56 (21): 2097–107.