Preamble


The announcement

THE CALL

The course material

The course material is open source, meaning that the source code is made freely available and may be redistributed and modified.

Typical science project

From: Grolemund and Wickham

From: Grolemund and Wickham

  • Import
    • Import data stored in a file, database, or web API, and load it into R
  • Tidy
    • Each column is a variable, and each row is an observation
  • Transform
    • Narrowing in on observations of interest
    • Creating new variables that are functions of existing variables
    • Calculating a set of summary statistics (like counts or means)
  • Visualize
    • May show unexpected things
    • May raise new questions about the data
    • A powerful communication platform
  • Model
    • Once questions made sufficiently precise, one can use a model to answer them
    • Model cannot question its “own” assumptions!
  • Communicate
    • Presentation and documentation

This short course will primarily focus on transforming and visualizing spatial data. Other elements mentioned above will though be touch upon, albeit in less details.

We are, where possible, going to use a set of recently developed tools that fall under the tidyverse and sf umbrella. These are basic set of generic tools that are integrated to work seamlessly with one another.

Getting started


For this course you need to have R (and Rtools), RStudio and a sweep of packages.

R

What is R?

  • R is command line driven programming language
    • its biggest appeal is one can reuse commands
    • its biggest hurdle in widespread use
  • R is open-source:
    • Other statistical software packages can be extremely expensive
    • Large user base with almost all statistical methods implemented

Why R?

R has become the lingua franca of statistical analysis and data wrangling

  • Its free! If you are a teacher, a student or a user, the benefits are obvious
  • It runs on a variety of platforms including Windows, Unix and MacOS
  • It provides an unparalleled platform for programming new statistical methods in an easy and straightforward manner
  • It offers powerful tools for data exploration and presentation
  • Encompasses reproducible workflow

Download R

RStudio

What is RStudio?

  • RStudio allows the user to run R in a user-friendly IDE environment.

  • It is open-source (i.e. free) and available at www.rstudio.com

  • Built to help you write R code, run R code, and analyze data with R

  • Text editor, project handling, markdown support, keyboard shortcuts, debugging tools, version control, …

  • Within RStudio one can achieve almost all that is needed to complete a typical science project

Download RStudio

Running R

Our environment - RStudio

A typical RStudio window may look something like this:

  • Console: One can type R commands directly into the console to perform calculations
  • Script editor: An R script is basically a series of stored R commands that can be run in the console
    • To generate a new script do: New file -> R Script (ctrl-shift-N)
  • Environment: Contains a list of all declared variables. If you have a dataset you can double click the variable to view the data.
  • History Contains a list of previous commands entered into the console
  • Other items:
    • Files: List of object in a directory
    • Plots: Graphical output from R. The user can export these the figures to file (as jpeg, png or pdf) or to clipboard
    • Help: Gives a brows-able interface to R’s in-line help pages. The user can search for a topic or a specific function

RStudio project

  • RStudio allows us to make things a little bit easier by isolating various tasks within specific projects (read: directory/folder on your computer).
  • Projects save the state between sessions. This includes:
    • Working directories
    • Open files/scripts
    • Workspaces (.RData file - do not save this)
  • One can have multiple RStudio projects open at any one time
    • The modern call for multitasking :-)

We strongly urge you to get into the habit of splitting your various tasks into specific RStudio projects.

A typical directory structure of a project (here called dummy) may be something like:

einarhj/edu/spatialr/dummy                   # project directory
├── data                                     # tidy data as R binary files
│   └── is_smb_stations.rds
├── data-raw                                 # raw data, often untidy
│   └── is_smb_stations.csv
├── dummy.Rproj                              # RStudio file
├── R                                        # Directory containing scripts
│   └── addition.R
└── survey_explorations.Rmd                  # A markdown report template
Hands on exercise - Projects and some basic R
  1. Open R-studio and create a new project: File -> New project … -> New directory –> Empty project –> …
  2. Create a new R script: File -> New file -> R Script
  3. Copy the content of this code into the R script.
  4. Save the script, e.g. exercise1.R: File -> Save
  5. Run each line of code, observe, change, add, and most importantly learn while doing (this is just a refresher of some elementary base R operations).
  6. Once you have gone through the whole script, try: File -> Compile Report …, choosing the output according to your liking :-)

Packages


What are packages?

  • Packages are a collection of functions and data with documentations.
  • Numerous basic packages come with R but the strength of the R-environment comes from the huge amount of packages that are provided by third sources.

Installing packages

If you have not already done so, or it is a long time since you have done so, run the following:

install.packages("tidyverse")

This will install among other things the core tidyverse packages:

  • ggplot2, for data visualization.
  • dplyr, for data manipulation.
  • tidyr, for data tidying.
  • readr, for data import.
  • purrr, for functional programming.
  • tibble, for “tibbles”, a modern re-imagining of data frames.
  • stringr, for strings
  • forcats, for factors

library(tidyverse) will load the core tidyverse packages.

In addition, this also installs a selection of other tidyverse packages that you’re likely to use frequently, but probably not in every analysis. This includes packages for:

  • Working with specific types of vectors:

  • Importing other types of data:

    • DBI, for databases (…, this may create a problem)
    • haven, for SPSS, SAS and Stata files.
    • httr, for web apis.
    • jsonlite for JSON.
    • readxl, for .xls and .xlsx files.
    • rvest, for web scraping.
    • xml2, for XML.
  • Modelling

    • modelr, for modelling within a pipeline
    • broom, for turning models into tidy data

These packages you’ll load explicitly with library().

Additional packages (there will be plenty) we will install as we progress through the course.