Vector data in R

In the R language there are two groups of classes to deal with vector data.

  1. The Spatial class, implemented in the sp package (released in 2005).
  2. The newer sf class, implemented in the sf package (released in 2016).

In this course we will focus mainly on the sf class, which have many advantages over the sp class. Still, it is useful to know about the sp class, in particular because may useful packages (still) use sp objects (and many packages may not be upgraded to accept sf objects).

Simple feature geometries

A feature refers to any object or observation in the real world. Features have: a) a geometry, indicating where in Earth the feature is located, b) attributes, describing it properties.

In simple features, all geometric attributes are described by points and straight lines (no curves).

Simple features access is an international open standard by the Open Geospatial Consortium (OGC) and ISO (ISO 19125) that specifies a common storage and access model for two-dimensional geometries (point, lines, polygons).

  • It specifies a unified way to represent spatial (vector) data.
  • It also specifices a number of topological metrics, predicates and operations.
  • Has well-known text (WKT) and binary (WKB) encoding.
  • WKB used by spatial databases (SQLite, PostGIS, etc.)
  • It supported by OSGEo libraries (GDAL, GEOS), GeoJSON and GeoSPARQL.

The seven most common simple features:

Type Description
POINT Zero-dimensional geometry with a single point
LINESTRING Sequence of points connected by straight lines
POLYGON Sequence of points forming a closed ring
MULTIPOINT Set of points
MULTILINESTRING Set of linestrings
MULTIPOLYGON Set of polygons
GEOMETRYCOLLECTION Set of geometries of any type

There are more simple features (CIRCULARSTRING, SURFACE, TRIANGLE, for a total of 18 features), but they are not supported by the sf package and we will not deal with them.

Simple features in R

In a nutshell, sf objects are extensions of data.frames or tibbles. Each row is one feature, this is, one spatial object that could have associated data.

An sf objects may have one or more columns with data for each of the features, and a “special” column usually named “geometry” or “geom” (but can have any name) with the geometry of the feature (its type and coordinates).

The geometry is a list column of class sfc. It has a bounding box and a coordinate reference system as attributes, and a class attribute pointing out the common type (or GEOMETRY in case of a mix).

Each element in the geometry column is a single simple feature geometry is of class sfg (single feature geometry), with further classes pointing out dimension and type.

This all sounds complicated, but it is not. Let’s build a simple sf object from scratch. First, load the packages we will use.

library(tidyverse)
library(sf)

Now we will create an sf object with two point features.

# Create the geometry for individual features. In this case, two points.1
p1 <- st_point(c(2, 3))
p2 <- st_point(c(1, 4))

class(p1) # Class "sfg" and "POINT"
## [1] "XY"    "POINT" "sfg"
# Now let's make a single feature geometry column
pts <- st_sfc(p1, p2)
class(pts)
## [1] "sfc_POINT" "sfc"
pts # Now it has a bounding box and a projection (CRS)
## Geometry set for 2 features 
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1 ymin: 3 xmax: 2 ymax: 4
## CRS:           NA
# Attach some data for the points and get an sf object
mydata <- tibble(a = c (10, 20), b = c ("A", "B"))

mysf <- st_as_sf(mydata, geometry = pts)
class(mysf)
## [1] "sf"         "data.frame"
mysf
## Simple feature collection with 2 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1 ymin: 3 xmax: 2 ymax: 4
## CRS:           NA
##    a b    geometry
## 1 10 A POINT (2 3)
## 2 20 B POINT (1 4)

Here we used the st_point() function to convert a vector into an sfg object. To build sfg objects for other geometries, you can use the other available builder functions:

Function Required input
st_point numeric vector (or one-row matrix)
st_multipoint numeric matrix with points in row
st_linestring numeric matrix with points in row
st_multilinestring list with numeric matrices with points in rows
st_polygon list with numeric matrices with points in rows
st_multipolygon list of lists with numeric matrices
st_geometrycollection list with (non-geometrycollection) simple feature objects

Rarelly you will need to build sf objects from scratch. More often you will have a data frame with coordinates and will want to convert that data into an sf object. We will do this, but first let’s learn a bit more about sf objects.

What is a list column anyway?

Let’s remember that data frame is also a list with vectors of the same length.

# This is a regular dataframe with three rows
my_df <- data.frame(a = 7:9, b = c("a", "b", "c"), c = c (10, 20, 30))

is.data.frame(my_df)
## [1] TRUE
is.list(my_df)
## [1] TRUE
my_df$a
## [1] 7 8 9
my_df[[1]]
## [1] 7 8 9

A list column is simply a column in a dataframe that contains a list, with one element per row.

# This is a list with three elements
my_list <- list(1:2, "Hello", c("A", "B", "C"))

# Now let's use the list as a column in the dataframe
my_df$d <- my_list
print(my_df)
##   a b  c       d
## 1 7 a 10    1, 2
## 2 8 b 20   Hello
## 3 9 c 30 A, B, C

List columns are used in sf objects to store its geometry.

A deeper look into sf objects

Let’s take a look at a “real” sf object, in this case the Bormicon areas used for the analysis of fisheries data in Icelandic waters.

bor <- read_sf("ftp://ftp.hafro.is/pub/data/shapes/bormicon.gpkg")

class(bor)
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"
glimpse (bor)
## Rows: 16
## Columns: 5
## $ nafn <chr> "Vestur", "Vestfirdir", "Mid.Nordur", "Grunnslod.nordur", "Nordau…
## $ area <dbl> 50436.738, 31952.912, 22803.168, 10500.918, 16024.065, 30154.976,…
## $ nr   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
## $ name <chr> "W", "NW", "N_center", "N_shallow", "NE", "E", "E_ridge", "SE", "…
## $ geom <POLYGON [°]> POLYGON ((-24.41668 65.5, -..., POLYGON ((-20.97307 65.5, -..., P…

Note that the object bor is of class tibble (the “enhanced” data.frames from the tidyverse) and sf. In addition, sf objects can also be data frames.

When you examine the object, first you can see the metadata, including: - geometry type (in this case POLYGON) - dimension (XY for 2D data) - bounding box (minimum and maximum x and y values) - epsg and projstring: the coordinate reference system of the data (more on this later)

Next you can see the data, including columns with the subarea, division, and area in km2. This is what is know as attributes. There is also a an additional column usually named “geometry” or “geom” (although any name could be used) where the geometry (i.e. coordinates and topology) are stored.

We can extract the metadata elements like this:

st_geometry_type(bor)
##  [1] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## [10] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

In this case the 16 features in this data set have the same geometry (POLYGON).

This gets the coordinate reference system:

st_crs(bor)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World"],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

This gets the bounding box:

st_bbox(bor)
##  xmin  ymin  xmax  ymax 
## -40.0  58.0  -4.0  72.5

We can also use the usual functions in sf objects:

nrow(bor)
## [1] 16
ncol(bor)
## [1] 5
names(bor)
## [1] "nafn" "area" "nr"   "name" "geom"

Exercise

Exercise
  1. Examine the metadata of the data sets in the following locations:

ftp://ftp.hafro.is/pub/data/shapes/helcom.gpkg ftp://ftp.hafro.is/pub/data/shapes/ospar.gpkg

  1. What is the difference between the st_read and the read_sf() functions?

For a quick look, we can use the plot() function. This uses R’s base graphs. Remember that R has at least three independent graphic systems: base graphs, ggplot, and lattice. In this course we will use some base graphs but we will focus on ggplot.

By default, plot() makes a map for each column.

plot(bor)

To only plot the geometry we can do the following:

plot(bor %>%
       st_geometry())

bor_geom <- st_geometry(bor)
class(bor_geom)
## [1] "sfc_POLYGON" "sfc"

Here, we used the st_geometry() function to extract the geometry of the sf object, dropping the attributes (i.e. the “non spatial” data).

Let’s try the ggplot2 package with geom_sf().

bor %>%
  ggplot() +
  theme_bw() +
  geom_sf(aes(fill = name)) +
  labs(fill = "Area")

geom_sf() is an unusual geom because it will draw different geometric objects depending on what simple features are present in the data: you can get points, lines, or polygons.

In the bor object, the geometry type is a POLYGON, as individual reporting areas are continuous. Often continous areas have “holes” (for example islands). In these cases the direction of points (clockwise or anticlockwise) indicating the topology.

Making our own sf objects

Very often the first step in our spatial analysis and mapping is to convert external objects into sf objects. This is done using the st_as_sf() function.

In the simplest case, our data is comprised of individual point observations, with associated data. In this case an sf object with geometry type POINT will suffice. Let’s get some data from the Icelandic bottom trawl survey:

smb <- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb.csv")
class(smb)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
smb <- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb.csv") %>%
  st_as_sf(coords = c("lon1", "lat1"), crs = 4326)
smb
## Simple feature collection with 19846 features and 38 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -27.25017 ymin: 62.78067 xmax: -10.34 ymax: 67.33767
## Geodetic CRS:  WGS 84
## # A tibble: 19,846 × 39
##       id date        year   vid tow_id t1                  t2                 
##  * <dbl> <date>     <dbl> <dbl> <chr>  <dttm>              <dttm>             
##  1 44929 1990-03-16  1990  1307 670-15 1990-03-16 05:05:00 1990-03-16 06:05:00
##  2 44928 1990-03-16  1990  1307 670-13 1990-03-16 07:48:00 1990-03-16 08:51:00
##  3 44927 1990-03-16  1990  1307 670-12 1990-03-16 09:35:00 1990-03-16 10:49:00
##  4 44926 1990-03-16  1990  1307 670-1  1990-03-16 11:31:00 1990-03-16 12:31:00
##  5 44925 1990-03-16  1990  1307 670-2  1990-03-16 13:06:00 1990-03-16 14:05:00
##  6 44922 1990-03-16  1990  1307 720-1  1990-03-16 20:21:00 1990-03-16 21:20:00
##  7 44921 1990-03-16  1990  1307 720-11 1990-03-16 21:49:00 1990-03-16 22:50:00
##  8 44920 1990-03-17  1990  1307 720-13 1990-03-17 00:16:00 1990-03-17 01:20:00
##  9 44919 1990-03-17  1990  1307 720-12 1990-03-17 02:18:00 1990-03-17 03:20:00
## 10 44918 1990-03-17  1990  1307 670-14 1990-03-17 05:23:00 1990-03-17 06:25:00
## # … with 19,836 more rows, and 32 more variables: lon2 <dbl>, lat2 <dbl>,
## #   ir <chr>, ir_lon <dbl>, ir_lat <dbl>, z1 <dbl>, z2 <dbl>, temp_s <dbl>,
## #   temp_b <dbl>, speed <dbl>, duration <dbl>, towlength <dbl>,
## #   horizontal <dbl>, verical <dbl>, wind <dbl>, wind_direction <dbl>,
## #   bormicon <dbl>, oldstrata <dbl>, newstrata <dbl>, cod_kg <dbl>,
## #   cod_n <dbl>, haddock_kg <dbl>, haddock_n <dbl>, saithe_kg <dbl>,
## #   saithe_n <dbl>, wolffish_kg <dbl>, wolffish_n <dbl>, plaice_kg <dbl>, …

We created a POINT geometry with the starting and ending of each tow (lat1 and lon1) using the coords argument to pass the column names with the longitude (or x coordinate) first, and the latitude (or y coordinate) second (this is a common mistake). We also specified the coordinate reference system (4326 is the ESPG code for unprojected data using the WSG84 datum).

Note that the “lon1” and “lat1” columns have disappeared and instead we have the “geometry” column. If we want to keep a copy of the “lon1” and “lat1” columns we can use the argument remove=FALSE.

Another example. Here we have VMS data from some bottom trawls. The data contains a series of locations (latitude and longitude) during each trawl. If we do the same as in the smb dataset, we obtain an sf object with POINT geometry.

vms <- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb_vms2019.csv") %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326)
vms
## Simple feature collection with 8918 features and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -27.25236 ymin: 63.01933 xmax: -10.55667 ymax: 67.32933
## Geodetic CRS:  WGS 84
## # A tibble: 8,918 × 6
##      vid vessel time                speed heading             geometry
##  * <dbl> <chr>  <dttm>              <dbl>   <dbl>          <POINT [°]>
##  1  2350 Árni   2019-02-27 08:07:46  1.40    69.1 (-21.86028 64.15597)
##  2  2350 Árni   2019-02-27 08:07:46  1.40    69.1 (-21.86028 64.15597)
##  3  2350 Árni   2019-02-27 08:18:05  8.1    252.  (-21.89164 64.16098)
##  4  2350 Árni   2019-02-27 08:28:25  3.70   240.  (-21.92865 64.15326)
##  5  2350 Árni   2019-02-27 08:30:25  3.10   266.  (-21.93287 64.15257)
##  6  2350 Árni   2019-02-28 13:12:35  7.50    43.4   (-21.92564 64.154)
##  7  2350 Árni   2019-02-28 13:12:35  7.50    43.4   (-21.92564 64.154)
##  8  2350 Árni   2019-02-28 13:22:36 10.5    334.  (-21.94404 64.17966)
##  9  2350 Árni   2019-02-28 13:32:56 10.7    316.  (-21.98714 64.20344)
## 10  2350 Árni   2019-02-28 13:45:06 10.7    315.  (-22.04541 64.22924)
## # … with 8,908 more rows
ggplot() +
  geom_sf(data=vms, alpha=0.25)

But this is not what we want. Rather than just points, we want to join the points from each haul to form a line. To do this we need to group the points by the ID column, and then use st_cast to convert the group of points into linestrings.

vms <- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb_vms2019.csv") %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
  mutate(fishing = if_else(speed > 2 & speed < 4, 1, 0)) %>%
  group_by(vid) %>%
  summarise(do_union = FALSE) %>%
  st_cast("LINESTRING")

vms
## Simple feature collection with 4 features and 1 field
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -27.25236 ymin: 63.01933 xmax: -10.55667 ymax: 67.32933
## Geodetic CRS:  WGS 84
## # A tibble: 4 × 2
##     vid                                                                 geometry
##   <dbl>                                                         <LINESTRING [°]>
## 1  1131 (-21.93494 64.15081, -21.93494 64.15081, -21.92787 64.15371, -21.92787 …
## 2  1277 (-13.91179 64.91615, -14.0081 64.92564, -13.99127 64.92077, -13.99127 6…
## 3  1281 (-18.88719 66.15852, -18.89815 66.15147, -18.88845 66.15773, -18.88845 …
## 4  2350 (-21.86028 64.15597, -21.86028 64.15597, -21.89164 64.16098, -21.92865 …
ggplot() +
  geom_sf(data=vms, aes(colour = as.factor(vid))) +
  theme(legend.position = "none")

Tracks with start and end locations

The sdm object contains the latitud and longitude at the start and end of each haul. Let’s make a it a LINESTRING, with straight lines for each haul.

smb <- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb.csv")

tracks <- smb %>%
  filter(year == 2019) %>%
  dplyr::select(id, 
                lat_1 = lat1, 
                lon_1 = lon1,
                lat_2 = lat2,
                lon_2 = lon2) %>%
  pivot_longer(cols = -id) %>%
  separate(name, sep = "_", into = c("info", "num")) %>%
  pivot_wider(names_from = info) %>%
  st_as_sf(coords = c ("lon", "lat"), crs = 4326) %>%
  group_by(id) %>%
  summarize(do_union = FALSE) %>%
  st_cast("LINESTRING")

ggplot() +
  geom_sf(data = tracks)

## Operating with attributes in sf objects

Remember that sf objects are data.frames or tibbles. So we can use the usual functions from base R or from the tidyverse to manipulate them.

Let’s load a set of polygons with the ICES ecoregions, and select the small ones.

ices_er <- read_sf("ftp://ftp.hafro.is/pub/data/shapes/ices_ecoregions.gpkg")

ices_er
## Simple feature collection with 17 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -44 ymin: 30.26705 xmax: 68.5 ymax: 90
## Geodetic CRS:  WGS 84
## # A tibble: 17 × 3
##    ecoregion                                   area_…¹                      geom
##    <chr>                                         <dbl>        <MULTIPOLYGON [°]>
##  1 Greenland Sea                                 275.  (((-43.99177 60.29694, -…
##  2 Bay of Biscay and the Iberian Coast            83.3 (((-3.586876 43.49365, -…
##  3 Azores                                         82.7 (((-21.05878 36, -23.494…
##  4 Western Mediterranean Sea                      88.5 (((-2.92761 35.27433, -2…
##  5 Ionian Sea and the Central Mediterranean S…    76.5 (((18.37672 39.79959, 18…
##  6 Black Sea                                      52.8 (((28.58799 43.80846, 28…
##  7 Adriatic Sea                                   15.3 (((19.99367 39.79384, 19…
##  8 Aegean-Levantine Sea                           74.6 (((26.17555 40.04279, 26…
##  9 Celtic Seas                                   129.  (((1.371231 62, 1.506343…
## 10 Baltic Sea                                     62.5 (((11.0789 53.14186, 11.…
## 11 Greater North Sea                              97.2 (((-4.497232 48, -4.5583…
## 12 Arctic Ocean                                  797.  (((68.5 82.97005, 68.058…
## 13 Icelandic Waters                              142.  (((-21.71718 63.84904, -…
## 14 Barents Sea                                   716.  (((12.7688 67.75756, 12.…
## 15 Faroes                                         46.7 (((-0.489327 63.89084, -…
## 16 Norwegian Sea                                 300.  (((15.12989 68.3119, 15.…
## 17 Oceanic Northeast Atlantic                    599.  (((-13.9097 60.82639, -1…
## # … with abbreviated variable name ¹​area_km2
ggplot() +
  geom_sf(data = ices_er, aes(fill=ecoregion))

sm_ices_er <- ices_er %>%
  filter(area_km2 < 100)

ggplot() +
  geom_sf(data = sm_ices_er, aes(fill=ecoregion))

Notice that the new object sm_ices_er also has a geometry column even if we did not selected it explicitly. In sf objects the geometry is “sticky”. In other words, subsets of sf objects are also sf objects. If we want to remove the geometry column we can do this:

sm_ices_er_df <- st_drop_geometry(sm_ices_er)

Also, note how the metadata of the sf object gets updated. Look at the bounding box of each object:

st_bbox(ices_er)
##      xmin      ymin      xmax      ymax 
## -44.00000  30.26705  68.50000  90.00000
st_bbox(sm_ices_er)
##      xmin      ymin      xmax      ymax 
## -35.57930  30.26705  41.78158  67.08025

Class Spatial

The Spatial class includes several derived classes to represent different types of spatial data (with and without associated data). For vector data, there are classes for points, lines and polygons:

  • SpatialPoints and SpatialPointsDataFrame
  • SpatialLines and SpatialLinesDataFrame
  • SpatialPolygons and SpatialPolygonsDataFrame

sp also has (limited) support for raster data:

  • SpatialGrid and SpatialGridDataFrame
  • SpatialPixel and SpatialPixelDataFrame

We will not deal with the spatial class, except to say that we can convert to and from sf.

library(tidyverse)
library(sf)
library(sp)

data(meuse)

class(meuse)
## [1] "data.frame"
coordinates (meuse) = ~ x + y # Convert to class sp

meuse.sf <- meuse %>% st_as_sf() # From spatial to sf

meuse.sp <- meuse %>% as("Spatial") # From sf to spatial