spatialr - introduction

Preamble

The announcement

The course material

The course material is open source, meaning that the source code is made freely available and may be redistributed and modified.

The source code for the course material is located at: https://github.com/fishvice/spatialr
The course product is rendered at: http://www.hafro.is/~einarhj/spatialr

Typical science project

From: Grolemund and Wickham

Import
- Import data stored in a file, database, or web API, and load it into R
Tidy
- Each column is a variable, and each row is an observation
Transform
- Narrowing in on observations of interest
- Creating new variables that are functions of existing variables
- Calculating a set of summary statistics (like counts or means)
- …
Visualize
- May show unexpected things
- May raise new questions about the data
- A powerful communication platform
- …
Model
- Once questions made sufficiently precise, one can use a model to answer them
- Model cannot question its “own” assumptions!
Communicate
- Presentation and documentation

This short course will primarily focus on transforming and visualizing spatial data. Other elements mentioned above will though be touch upon, albeit in less details.

We are, where possible, going to use a set of recently developed tools that fall under the tidyverse and sf umbrella. These are basic set of generic tools that are integrated to work seamlessly with one another.

Getting started

For this course you need to have R (and Rtools), RStudio and a sweep of packages.

R

What is R?

R is command line driven programming language
- its biggest appeal is one can reuse commands
- its biggest hurdle in widespread use
R is open-source:
- Other statistical software packages can be extremely expensive
- Large user base with almost all statistical methods implemented

Why R?

R has become the lingua franca of statistical analysis and data wrangling

Its free! If you are a teacher, a student or a user, the benefits are obvious
It runs on a variety of platforms including Windows, Unix and MacOS
It provides an unparalleled platform for programming new statistical methods in an easy and straightforward manner
It offers powerful tools for data exploration and presentation
Encompasses reproducible workflow

Download R

Latest version of R: see The Comprehensive R Archive Network
- If your platform is Windows, also install Rtools
- If your platform is Mac, install XCode via AppStore.

RStudio

What is RStudio?

RStudio allows the user to run R in a user-friendly IDE environment.
It is open-source (i.e. free) and available at www.rstudio.com
Built to help you write R code, run R code, and analyze data with R
Text editor, project handling, markdown support, keyboard shortcuts, debugging tools, version control, …
Within RStudio one can achieve almost all that is needed to complete a typical science project

Download RStudio

Latest version of RStudio: see RStudio Desktop
- If you are adventurous install the latest developmental version

Running R

Our environment - RStudio

A typical RStudio window may look something like this:

Console: One can type R commands directly into the console to perform calculations
Script editor: An R script is basically a series of stored R commands that can be run in the console
- To generate a new script do: New file -> R Script (ctrl-shift-N)
Environment: Contains a list of all declared variables. If you have a dataset you can double click the variable to view the data.
History Contains a list of previous commands entered into the console
Other items:
- Files: List of object in a directory
- Plots: Graphical output from R. The user can export these the figures to file (as jpeg, png or pdf) or to clipboard
- Help: Gives a brows-able interface to R’s in-line help pages. The user can search for a topic or a specific function

RStudio project

RStudio allows us to make things a little bit easier by isolating various tasks within specific projects (read: directory/folder on your computer).
Projects save the state between sessions. This includes:
- Working directories
- Open files/scripts
- Workspaces (.RData file - do not save this)
One can have multiple RStudio projects open at any one time
- The modern call for multitasking :-)

We strongly urge you to get into the habit of splitting your various tasks into specific RStudio projects.

A typical directory structure of a project (here called dummy) may be something like:

einarhj/edu/spatialr/dummy                   # project directory
├── data                                     # tidy data as R binary files
│   └── is_smb_stations.rds
├── data-raw                                 # raw data, often untidy
│   └── is_smb_stations.csv
├── dummy.Rproj                              # RStudio file
├── R                                        # Directory containing scripts
│   └── addition.R
└── survey_explorations.Rmd                  # A markdown report template

Hands on exercise - Projects and some basic R

Open R-studio and create a new project: File -> New project … -> New directory –> Empty project –> …
Create a new R script: File -> New file -> R Script
Copy the content of this code into the R script.
Save the script, e.g. exercise1.R: File -> Save
Run each line of code, observe, change, add, and most importantly learn while doing (this is just a refresher of some elementary base R operations).
Once you have gone through the whole script, try: File -> Compile Report …, choosing the output according to your liking :-)

Packages

What are packages?

Packages are a collection of functions and data with documentations.
Numerous basic packages come with R but the strength of the R-environment comes from the huge amount of packages that are provided by third sources.

Installing packages

If you have not already done so, or it is a long time since you have done so, run the following:

install.packages("tidyverse")

This will install among other things the core tidyverse packages:

ggplot2, for data visualization.
dplyr, for data manipulation.
tidyr, for data tidying.
readr, for data import.
purrr, for functional programming.
tibble, for “tibbles”, a modern re-imagining of data frames.
stringr, for strings
forcats, for factors

library(tidyverse) will load the core tidyverse packages.

In addition, this also installs a selection of other tidyverse packages that you’re likely to use frequently, but probably not in every analysis. This includes packages for:

Working with specific types of vectors:
- lubridate, for date/times.
Importing other types of data:
- DBI, for databases (…, this may create a problem)
- haven, for SPSS, SAS and Stata files.
- httr, for web apis.
- jsonlite for JSON.
- readxl, for .xls and .xlsx files.
- rvest, for web scraping.
- xml2, for XML.
Modelling
- modelr, for modelling within a pipeline
- broom, for turning models into tidy data

These packages you’ll load explicitly with library().

Additional packages (there will be plenty) we will install as we progress through the course.