Preamble

Nice resources:

Needed libraries for this tutorial:

library(tidyverse)  # This loads among other things the ggplot2-package
library(ggmap)
library(gridExtra)
library(maps)
library(mapdata)

Note if you have not installed these package, check the Installing-documentation.

Getting example data into R

For this tutorial we are going to use the following data (information on them found here)

minke <- read.csv("http://www.hafro.is/~einarhj/data/minke.csv",
                  stringsAsFactors = FALSE)
sau <- read.csv("http://www.hafro.is/~einarhj/data/sau-crfm-country-catches.csv",
                stringsAsFactors = FALSE)
iceland <- read.csv("http://www.hafro.is/~einarhj/data/iceland.csv",
                    stringsAsFactors = FALSE)

Just get a quick overview of the data we use the glimpse-function:

glimpse(minke)
#> Observations: 190
#> Variables: 13
#> $ id             <int> 1, 690, 926, 1333, 1334, 1335, 1336, 1338, 1339...
#> $ date           <chr> "2004-06-10 22:00:00", "2004-06-15 17:00:00", "...
#> $ lon            <dbl> -21.4, -21.4, -19.8, -21.6, -15.6, -18.7, -21.5...
#> $ lat            <dbl> 65.7, 65.7, 66.5, 65.7, 66.3, 66.2, 65.7, 66.1,...
#> $ area           <chr> "North", "North", "North", "North", "North", "N...
#> $ length         <int> 780, 793, 858, 567, 774, 526, 809, 820, 697, 77...
#> $ weight         <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
#> $ age            <dbl> 11.3, NA, 15.5, 7.2, 12.3, 9.6, 17.3, 13.8, 12....
#> $ sex            <chr> "Female", "Female", "Female", "Male", "Female",...
#> $ maturity       <chr> "pregnant", "pregnant", "pregnant", "immature",...
#> $ stomach.volume <dbl> 58, 90, 24, 25, 85, 18, 200, 111, 8, 25, 38, 6,...
#> $ stomach.weight <dbl> 31.900, 36.290, 9.420, 3.640, 5.510, 1.167, 99....
#> $ year           <int> 2004, 2004, 2004, 2003, 2003, 2003, 2003, 2003,...
glimpse(sau)
#> Observations: 64
#> Variables: 4
#> $ year       <int> 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 195...
#> $ reported   <dbl> 14.2, 14.5, 16.8, 19.9, 19.9, 21.9, 23.2, 23.6, 27....
#> $ unreported <dbl> 75.8, 78.3, 79.4, 87.0, 88.0, 90.7, 92.3, 91.0, 93....
#> $ total      <dbl> 90.0, 92.8, 96.3, 106.9, 107.9, 112.6, 115.6, 114.6...
glimpse(iceland)
#> Observations: 1,323
#> Variables: 2
#> $ lat <dbl> 65.8, 65.8, 65.8, 65.7, 65.7, 65.7, 65.6, 65.6, 65.6, 65.6...
#> $ lon <dbl> -23.9, -24.0, -24.1, -24.1, -24.0, -23.9, -23.8, -23.8, -2...

ggplot: Key components

ggplot has three key components:

  1. data,

  2. A set of aesthetic mappings [aes] for the variables in the data, and

  3. At least one layer which describes how to render each observation. Layers are usually created with a geom function.

ggplot(data = minke, aes(x = age, y = length)) + 
  geom_point()

Here we have basically just created a point-plot where age is plotted (“mapped”) on the x-axis and length on the y-axis. One can use different syntax resulting in the same outcome:

ggplot(minke, aes(x = age, y = length)) + geom_point()
ggplot(minke, aes(age, length))         + geom_point()
ggplot()                                + geom_point(data = minke, aes(age, length))
ggplot(data = minke)                    + geom_point(aes(x = age, y = length))
ggplot(minke)                           + geom_point(aes(age, length))
minke %>% ggplot()                      + geom_point(aes(age, length))

One has quite a set of options were one can control colour, size, shape and alpha (the transparency of the points):

ggplot(minke) + 
  geom_point(aes(age, length), colour = "red", size = 2, shape = 3, alpha = 0.4)

One can also stored the plot in an object for later use:

p <- ggplot(minke) + geom_point(aes(age, length))

The above is useful if one wants to build a plot step-by-step or arrange many plots together for display (see below).

Using the value of varible to control aesthetics

In the above cases the colour, size, shape and alpha levels are fixed values set outside the aes-function. We can however use the value of a variable in the dataset to control the visualization. Below is just a list of some things that can be set:

aesthetic: colour

One can add more aesthetics to the plot, e.g. if one wants to distinguish between sex or the area where the whale was caught:

p <- ggplot(minke)
p + geom_point(aes(age, length, colour = sex))
p + geom_point(aes(age, length, colour = area))

Colours can be manually specified using scale_colour_manual:

p <- ggplot(minke)
p + geom_point(aes(age, length, colour = sex)) +
  scale_colour_manual(values = c("orange","brown"))
p + geom_point(aes(age, length, colour = area)) +
  scale_colour_manual(values = c("green","red"))

In the above case sex and area had only two values, hence only two colours were specified.

aesthetic: shape

p + geom_point(aes(age, length, shape = sex))

Exercise

Exercise

Create a code that results in these plots:

aesthetic: size

p + geom_point(aes(age, length, size = stomach.volume))

To reveal overlays:

p + geom_point(aes(age, length, size = stomach.volume), alpha = 0.6)
p + geom_point(aes(age, length, size = stomach.volume), alpha = 0.3, col = "red")

facetting

A plot can be subsetted by using the face_wrap-function. Here we e.g. split the plot up into the two survey areas (North and South):

ggplot(minke) + 
  geom_point(aes(age, length, colour = sex)) + 
  facet_wrap(~ area)

Exercise

Create a code that results in this plot:

Plotting plots side-by-side

If one wanted two different plots side-by-side one needs to store each plot as and object and then use the grid.arrange-function. E.g.:

p1 <- p + geom_point(aes(age, length), colour = "blue")
p2 <- p + geom_point(aes(age, length, shape = sex))
grid.arrange(p1, p2, ncol = 2)

layers

In the above cases we have only one type of a layer, the point layer. ggplot2 has of course myriads of layers. Below are some examples that give a brief overview of some other layers.

layer: line

ggplot(sau, aes(year, total)) + geom_line()

You may have noticed that by default the plot area shown cover only the range of the data. In the above case one may have wanted to have the y-plot have a starting point at zero. Here we would need to use the expand_limits-function:

ggplot(sau, aes(year, total)) + geom_line() + expand_limits(y = 0)

layer: bar charts

We can create histograms for discrete data using the geom_bar-function:

ggplot(minke, aes(maturity)) + geom_bar()

Exercise

Modify the above code to generate the following:

layer: histograms

We can create histograms for discrete data using the geom_bar-function. That function has also an argument for controlling the binwidth:

p <- ggplot(minke, aes(length))
p + geom_histogram()
p + geom_histogram(binwidth = 50)

One may want to get a histogram show the size distribution of e.g. the different sexes:

p + geom_histogram(aes(fill = sex))

It is quite hard to get a visualization of the length distribution of the Females here, so it may be better to split the plot into different panels:

p + geom_histogram(aes(fill = sex)) + facet_wrap(~ sex, ncol = 1)

layer: frequency polygons

Instead of a histogram it may also be better to show things as frequency lines:

p + geom_freqpoly()
p + geom_freqpoly(aes(colour = sex), binwidth = 50)

layer: points as jitters

Add a little random noise to the data to avoid over-plotting can be done using the geom_jitter-function:

p <- ggplot(minke, aes(sex, length))
p + geom_point()
p + geom_jitter()

layer: summarise via boxplots or violins

Instead of a jitter plot, a more condensed way to show the distribution of the data is to use box- or violinplots:

p + geom_boxplot()
p + geom_violin()

layer: overlay data and summary layers

In ggplot one can have more than one layer. E.g. in if we generate a summary distribution plot one may also want the get a “glimpse” at the raw data:

p + geom_boxplot() + geom_jitter(colour = "red", alpha = 0.3)

Exercise

Create the following 3 layer plot:

layer: add smoother to a plot

Another example where we add layers could be adding a smoother to a point-plot:

p <- ggplot(minke, aes(age, length))
p + geom_point() + geom_smooth()
p + geom_point() + geom_smooth(span = 0.1)

We even have some specific models we could try, here a linear model:

p + geom_point() + geom_smooth(method = "lm")

Some controls

If we want to put a plot into a report we may not like the default that is given but to refine the plot further. There are a number of functions in ggplot2 that allow us to take full control on the final outlook.

labels

So far, the labels in the plot have been just the variable names in the dataframe. To specify the labels one can use the labs-function:

p <- ggplot(minke, aes(age, length, colour = sex)) + geom_point()
p
p + labs(x = "Age [year]", y = "Length [cm]", 
         colour = "Sex", title = "My minke plot",
         subtitle = "Based on survey data from 2003-2007")

breaks

Controlling which values appear as tick marks one can use:

p <- ggplot(minke, aes(age, length)) + geom_point() + labs(x = NULL, y = NULL)
p
p +
  scale_x_continuous(breaks = c(5, 10, 15, 20, 25, 30, 35, 40, 45))
p +
  scale_x_continuous(breaks = seq(5, 45, by = 5)) +
  scale_y_continuous(breaks = seq(500, 950, by = 50))

Exercise

Create code that mimics the following plot:

limits

If you only want “zoom into” a data area one can use xlim or ylim:

p <- ggplot(minke, aes(maturity, length))
p + geom_jitter()
p + geom_jitter() + ylim(600, 800)
p + geom_jitter() + ylim(NA, 800) # setting only one limit

For discrete variables:

p + geom_jitter() + ylim(600,800) + xlim("immature","mature")

warning

But be careful when using with summary statistics, e.g.:

p + geom_boxplot()
p + geom_boxplot() + ylim(600, 800)

This is because when you specify the ylim the data outside that range are filtered out completely from the plot-data. The remedy is to wrap the function inside the coord_cartesian-function:

p + geom_boxplot() + coord_cartesian(ylim = c(600, 800))

A little gis

The object island contains the Latitude and Longitude of the Icelandic shoreline. We could try to do a point or a line plot:

p <- ggplot(iceland, aes(lon, lat)) + labs(x = NULL, y = NULL)
p + geom_point()
p + geom_line()

The point plot shows something that is close to recognizable. However in the line plot data are rearranged such the line is drawn from the smallest x-value, to the next-smallest x-value and so on. The island object actually has the data arranged in a specific order. If one want to retain that order one needs to use the geom_path-function. In addition, because the data are in coordinates we want to set the correct aspect ratio between the x- and the y-axis, hence here we also need to use the coord_map-function

p + geom_path()
p + geom_path() + coord_map()

So, gis-mapping is nothing more than having data in a structured order and one connect the dots with a line in addition to specifying the map projection.

On maps

  • Maps as background for r-plot can come from myriad of sources
  • In this course we will largely focus on objects available in the map- and mapdata-packages.
  • To get the data into ggplot2 friendly form (data.frame) we use the map_data function.
m <- map_data("world")
str(m)
#> 'data.frame':    99338 obs. of  6 variables:
#>  $ long     : num  -69.9 -69.9 -69.9 -70 -70.1 ...
#>  $ lat      : num  12.5 12.4 12.4 12.5 12.5 ...
#>  $ group    : num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ order    : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ region   : chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
#>  $ subregion: chr  NA NA NA NA ...
ggplot(m) +
  geom_polygon(aes(long, lat, group = group)) +
  coord_quickmap()

The resolution of “world” is not very high as can be seen if only want Barbados:

m <- map_data("world", region = "Barbados")
ggplot(m) +
  geom_polygon(aes(long, lat, group = group)) +
  coord_quickmap()

If we want higher resolution the object “worldHires” is often sufficient:

m <- map_data("worldHires")
m <- m[m$long > -85 & m$long < -60 & m$lat > 0 & m$lat < 30,]
ggplot(m) +
  geom_polygon(aes(long, lat, group = group)) +
  coord_quickmap(ylim = c(7, 28))

But it is still not really good, e.g. if we check out St. Vincent:

m <- m[m$region == "Saint Vincent",]
ggplot(m) +
  geom_polygon(aes(long, lat, group = group)) +
  coord_quickmap()

we only get the main island, not the Grenadines.

gis: Lets generate a base map:

m <- ggplot(iceland, aes(lon, lat)) +
  theme_bw() +
  geom_polygon(fill = "grey90") +
  coord_map() +
  labs(x = NULL, y = NULL)
m

gis: Add layers

m + geom_point(data = minke, aes(lon, lat))
m + geom_point(data = minke, aes(lon, lat, colour = area))

m + geom_point(data = minke, aes(lon, lat, colour = sex))
m + geom_point(data = minke, aes(lon, lat, colour = year))
m + geom_point(data = minke, aes(lon, lat, colour = factor(year)))
m + geom_point(data = minke, aes(lon, lat, size = length), alpha = 0.2)
# possible remedy of the above plot - is beyound the basic introduction
m + geom_point(data = minke, aes(lon, lat, colour = length)) +
  scale_colour_gradient(low = "yellow", high = "red")

gis: Another type of base map:

m2 <- get_map(location = c(-19,65), zoom= 6, maptype = "satellite", source = "google")
m2 <- ggmap(m2) +
  labs(x = NULL, y = NULL)
m2

Exercise

Repeat previous plots where we use “iceland” or do new ones, but using the Google base as a background. E.g.: