Working with geometries in sf objects

These are the packages needed for this tutorial:

library(tidyverse)
library(sf)

Operating with single geometries in sf object

The package sf provides a large set of functions to query properties of sf objects and to combine sf objects to form new geometries.

Geometric operations (and the sf functions that implement them) can be classified as:

predicates: they return a logical value (TRUE or FALSE)
measures: they return a value
geometry generator operations: they return new geometries.

Operations can also be classified as unary, binary, or n-ary, depending if they need one, two or more sf objects.

Remember to use projected data

We want to get areas in m2…not square degrees!
For geometric operations:
Make sure that they are in a projected crs.
Make sure that all layers have the same crs.

Extracting basic information

sf has several functions to extract basic information from objects of class sf:

nfu <- 
  read_sf("ftp://ftp.hafro.is/pub/data/shapes/nephrops_fu.gpkg") %>%
  st_transform(3395) # Transform to World Mercator

st_is_simple(nfu)

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

st_is_valid(nfu)

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

st_is_empty(nfu)

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE

st_is_longlat(nfu)

## [1] FALSE

st_dimension(nfu) # 0=points, 1=linear, 2=polygon

##  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

st_geometry_type(nfu)

##  [1] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## [10] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## [19] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## [28] POLYGON POLYGON POLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

st_crs(nfu) # Query (or set) the coordinate reference system

## Coordinate Reference System:
##   User input: EPSG:3395 
##   wkt:
## PROJCRS["WGS 84 / World Mercator",
##     BASEGEOGCRS["WGS 84",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["World Mercator",
##         METHOD["Mercator (variant A)",
##             ID["EPSG",9804]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World - between 80°S and 84°N"],
##         BBOX[-80,-180,84,180]],
##     ID["EPSG",3395]]

st_bbox(nfu) # Bounding  box

##     xmin     ymin     xmax     ymax 
## -1669792  4344223  1447153  8821377

Generating new geometries

Sampling

The function st_sample() can be used to sample locations from polygons or linestrings. The result is an sfc object with the sampled POINT geometries.

nfu <- 
  read_sf("ftp://ftp.hafro.is/pub/data/shapes/nephrops_fu.gpkg") %>%
  st_transform(3395) # Transform to World Mercator

set.seed(100)
pts1 <- st_sample(nfu, size = 100, type = "random")

nfu_lines <- st_cast(nfu, "LINESTRING")
pts2 <- st_sample(nfu_lines, size = 100, type = "random")

ggplot() +
  geom_sf(data = nfu) +
  geom_sf(data = pts1, color = "red") +
  geom_sf(data = pts2, color = "blue")

Centroids

The mean position of geometries (LINESTRING OR POLYGONS) can be obtained by the st_centroid() function.

centr <- st_centroid(nfu)
centr2 <- st_point_on_surface(nfu) # Not exactly the centroid, but always in the polygon

ggplot() +
  geom_sf(data = nfu) +
  geom_sf(data = centr, color = "red") +
  geom_sf(data = centr2, color = "blue")

Buffers

The function st_buffer() can be used for points, linestrings or polygons.

buf <- st_buffer(pts1, dist = 100000) # 100 km buffer
ggplot() +
  geom_sf(data = buf, linetype = "dashed") +
  geom_sf(data = pts1)

buf <- st_buffer(nfu_lines, dist = 50000)
ggplot() +
  geom_sf(data = buf, aes(fill = name))

buf <- st_buffer(nfu, dist = 20000)
ggplot() +
  geom_sf(data = buf, aes(fill = name))

# Negative buffers are allowed for polygons!
buf <- st_buffer(nfu, dist = -20000)
ggplot() +
  geom_sf(data = buf, aes(fill = name))

Convex hull

A convex hull is the smallest convex polygon that includes a group of points. We can use the st_convex_hull() function to get it, but first we need to join the POINT geometries into a single MULTIPOINT geometry. Otherwise we get a “convex hull” for each individual point.

pts1_u <- st_union(pts1)

chull <- st_convex_hull(pts1_u)

ggplot() +
  geom_sf(data = chull) +
  geom_sf(data = pts1)

st_union() can be used to simplify features. For polygons, it will combine multiple polygons into a single polygon. For points, it clusters individual points into a MULTIPOINT geometry.

Concave hulls

Concave hulls are similar to convex hull, but they are concave (duh!). To compute a concave hull we need the package concaveman Note that the concaveman takes an sf object, but it does not accept an sfc object. (i.e. a geometry)

library(concaveman)

conc1 <- concaveman(st_as_sf(pts1), concavity = 1)
conc2 <- concaveman(st_as_sf(pts1), concavity = 2.5)

ggplot() +
  geom_sf(data = conc1, color = "red", fill= NA) +
  geom_sf(data = conc2, color = "blue", fill = NA) +
  geom_sf(data = pts1)

With high values for the concavity parameter we get a convex hull.

Grids

The function st_grid provides rectangular or hexagonal grids. You can get polygons, nodes or centroids.

sq_grid <- st_make_grid(nfu, n = c(10, 15))

hex_grid <- st_make_grid(nfu, n = c(10, 15), square = FALSE)

ggplot() +
  geom_sf(data = sq_grid, fill = NA, color = "gray") +
  geom_sf(data = hex_grid, fill = NA, color = "red") +
  theme_bw()

Sometimes we want to make a polygon of the bounding box of a geometry. For this, we can make a grid with a single cell.

box <- st_make_grid(nfu, n = 1)

The st_graticule() can be used to create a set of lines with constant latitude or longitude, which can be added to maps as reference.

Simplification

Sometimes it is necessary to simplify vector data, reducing the memory, disk space and bandwidth they require, and speeding up geometrical operations. For this we use the st_simplify() function.

helcom <- 
  read_sf("ftp://ftp.hafro.is/pub/data/shapes/helcom.gpkg") %>%
  st_transform(3395)

helcom_simple <- st_simplify(helcom, dTolerance = 5000)

object.size(helcom)

## 1501112 bytes

object.size(helcom_simple)

## 90888 bytes

ggplot() + geom_sf(data = helcom)

ggplot() + geom_sf(data = helcom_simple)

Measurements

Areas and lengths

The sf package provides a series of unary measures that return a single values describing some properties of geometries.

st_area(nfu)

## Units: [m^2]
##  [1]  12097265284  70529205689 101515726451  43665187421  52789217861
##  [6]  83897170656 182225854980  41084273347 109930365725  77625018558
## [11]  39711206899  81904483984  53266972071  25500316209  33494096919
## [16]  89343995677  46822981830  15399272037 224243874610  30901019019
## [21]  34189728313 587794324640  65924335674  67677150967 101975539823
## [26] 104801209635  75669588860 239994539247  33245051334  58274175242

st_length(nfu) # Polygons have no length!

## Units: [m]
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

st_length(nfu_lines)

## Units: [m]
##  [1]  439982.1 1078852.6 1275871.1  837529.0 1201254.6 1423893.6 1797728.5
##  [8]  814344.3 1832675.5 1366190.1  802009.8 1158423.9 1144988.3  751134.1
## [15]  746160.6 1462641.2  865896.0  583612.0 2390085.9  722866.5 1044121.7
## [22] 3530082.2 1037486.3 1053232.2 1484089.6 1454082.0 1220079.3 2186042.3
## [29]  842611.4 1087293.5

Distances between objects

The function st_distance() returns a dense numeric matrix with distances between geometries.

Let’s get a few points, lines and polygons:

set.seed(100)
poly <- nfu[1:3, ]

pts_a <- st_sample(poly, 5) %>%
  st_as_sf()%>%
  mutate(lab = 1:5)

pts_b <- st_sample(poly, 2)

st_distance(pts_a, pts_b)

## Units: [m]
##           [,1]      [,2]
## [1,] 390038.79 215873.96
## [2,] 271325.80 132231.48
## [3,]  83469.62  91539.58
## [4,] 413750.94 253766.06
## [5,] 313926.48 141064.78

st_distance(pts_a, poly)

## Units: [m]
##          [,1]      [,2]     [,3]
## [1,] 514192.8  99476.91      0.0
## [2,] 357723.2  13305.68      0.0
## [3,] 227285.5      0.00 184031.4
## [4,] 503294.0 154725.62      0.0
## [5,] 430993.2  41763.01      0.0

ggplot() +
  geom_sf(data = poly, aes(fill = name)) +
  geom_sf(data = pts_a) +
  geom_sf_text(data = pts_a,
               nudge_x = 20000,
               aes(label = lab))

The function returns a distance matrix with all pairwise distances (but see the by_element argument). The matrix is of class units, for numerical data with their associated measurement unit.

Note that we get a distance of 0 when the point is inside the polygon.

Aggregation

Aggregation involves summarising a dataset by a grouping variables, usually one of the attribute columns. The sf package provides methods for stats::aggregate and dplyr::summarise. When used, they:

take a groping predicate (e.g. from group_by())
take one or more aggregation functions
aggregate the attributes per group
aggregate the geometries
if do_union==TRUE, union the aggregated geometries.

Let’s look again to our small vms dataset:

vms <- 
  read_csv("ftp://ftp.hafro.is/pub/data/csv/small_vms.csv") %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326)

vms # It is geometry type POINT.

## Simple feature collection with 1070 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -22.38448 ymin: 65.83757 xmax: -11.57953 ymax: 67.18746
## Geodetic CRS:  WGS 84
## # A tibble: 1,070 × 2
##       id             geometry
##  * <dbl>          <POINT [°]>
##  1     1 (-11.77807 66.02346)
##  2     1  (-11.7601 66.01879)
##  3     1 (-11.74409 66.01425)
##  4     1 (-11.72632 66.00922)
##  5     1 (-11.70703 66.00563)
##  6     1  (-11.7107 66.00186)
##  7     1 (-11.74066 66.00265)
##  8     1 (-11.76391 66.00338)
##  9     1  (-11.78793 66.0038)
## 10     1 (-11.81134 66.00404)
## # … with 1,060 more rows

vms_gr <- vms %>%
  group_by(id) %>%
  summarise(n = n())

vms_gr # It is geometry MULTIPOINT!

## Simple feature collection with 48 features and 2 fields
## Geometry type: MULTIPOINT
## Dimension:     XY
## Bounding box:  xmin: -22.38448 ymin: 65.83757 xmax: -11.57953 ymax: 67.18746
## Geodetic CRS:  WGS 84
## # A tibble: 48 × 3
##       id     n                                                          geometry
##    <dbl> <int>                                                  <MULTIPOINT [°]>
##  1     1    18 ((-11.90825 65.99555), (-11.86019 66.00352), (-11.88329 66.0046)…
##  2     2    27 ((-11.8924 66.05173), (-11.69867 66.02163), (-11.7207 66.0165), …
##  3     3    28 ((-11.59924 65.86496), (-11.61083 65.87383), (-11.59346 65.8715)…
##  4     4    24 ((-11.65764 65.97255), (-11.62297 65.92655), (-11.6206 65.9168),…
##  5     5    30 ((-11.57953 65.90664), (-11.58784 65.91653), (-11.59357 65.92663…
##  6     6    27 ((-11.59597 65.98899), (-11.59343 65.97939), (-11.59295 65.96986…
##  7     7    16 ((-11.95675 66.15033), (-11.94637 66.14257), (-11.93504 66.12985…
##  8     8    32 ((-11.96925 66.15286), (-11.95733 66.14282), (-11.96078 66.13886…
##  9     9    21 ((-12.23553 66.13731), (-12.24042 66.14221), (-12.26674 66.14252…
## 10    10    26 ((-12.436 66.10825), (-12.42128 66.11192), (-12.43039 66.11488),…
## # … with 38 more rows

vms_tr <- vms %>%
  group_by(id) %>%
  summarise(n = n()) %>%
  st_cast("LINESTRING")

vms_tr # It is now a LINESTRING.

## Simple feature collection with 48 features and 2 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -22.38448 ymin: 65.83757 xmax: -11.57953 ymax: 67.18746
## Geodetic CRS:  WGS 84
## # A tibble: 48 × 3
##       id     n                                                          geometry
##    <dbl> <int>                                                  <LINESTRING [°]>
##  1     1    18 (-11.90825 65.99555, -11.86019 66.00352, -11.88329 66.0046, -11.…
##  2     2    27 (-11.8924 66.05173, -11.69867 66.02163, -11.7207 66.0165, -11.74…
##  3     3    28 (-11.59924 65.86496, -11.61083 65.87383, -11.59346 65.8715, -11.…
##  4     4    24 (-11.65764 65.97255, -11.62297 65.92655, -11.6206 65.9168, -11.6…
##  5     5    30 (-11.57953 65.90664, -11.58784 65.91653, -11.59357 65.92663, -11…
##  6     6    27 (-11.59597 65.98899, -11.59343 65.97939, -11.59295 65.96986, -11…
##  7     7    16 (-11.95675 66.15033, -11.94637 66.14257, -11.93504 66.12985, -11…
##  8     8    32 (-11.96925 66.15286, -11.95733 66.14282, -11.96078 66.13886, -11…
##  9     9    21 (-12.23553 66.13731, -12.24042 66.14221, -12.26674 66.14252, -12…
## 10    10    26 (-12.436 66.10825, -12.42128 66.11192, -12.43039 66.11488, -12.3…
## # … with 38 more rows

Note that the order of the points in the LINESTRING changed. This is because by default the geometries are combined using st_union() In this case though it is better to use st_combine(), by adding do_union = FALSE*.

plot(vms_tr %>% st_geometry())

vms_tr <- vms %>%
  group_by(id) %>%
  summarise(n = n(),
            do_union = FALSE) %>%
  st_cast("LINESTRING")

plot(vms_tr %>% st_geometry())

Let’s aggregate the ICES ecoregions into our three groups:

ices_er <- 
  read_sf("ftp://ftp.hafro.is/pub/data/shapes/ices_ecoregions.gpkg") %>%
  mutate(zone = c(1, 2, 2, 3, 3, 3, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 2))  %>% # Add a grouping variable: 1=Arctic & subarctic, 2=North Atlantic, 3=Mediterranean & Black Sea
  # the original object is not valid:
  st_make_valid()

ices_er_zones <- 
  ices_er %>%
  group_by(zone) %>%
  summarize(area = sum(area_km2))

ggplot() +
  geom_sf(data = ices_er_zones,
          aes(fill = area))

When aggregating polygons, it is better to allow the default do_union=TRUE to dissolve internal boundaries. Otherwise we get invalid polygons (and internal divisions in our plots).

We can use rbind to join spatial objects, but the column names and coordinate reference system need to be the same.

sm_ices_er <- ices_er %>%
  filter(area_km2 < 100)

lg_ices_er <- ices_er %>%
  filter(area_km2 >= 100)

all_ices_er <- rbind(sm_ices_er, lg_ices_er)

Relationship between attributes and geometry

Most sf objects contain attributes and geometry. Because geometric operations do not change existing attributes, we need to be careful about if the attributes make sense in terms of the new geometries.

Example:

We take some trawls within a polgyon representing our study area.
We calculate the total abundance of some species (say cod) in that polygon. This is an attribute of the polygon (POP_COD)
We split the polygon into three areas based on bottom depth.
The attribute POP_COD is assigned to the three depthbased polygons… but it is not “right” anymore.

We need to distinguish between three attributegeometry relationships (AGR)

Constant attributes. They are valid everywhere within a geometry. Examples: bottom type, depth class.
Aggregate attributes. They are a summary value over the geometry. Examples: total abundance, mean abundance.
Identity attributes. They identify uniquely the entire geometry. Example: EEZs, ICES ecoregions. A sample of this geometry has not the identity property anymore. It becomes a constant attribute.

When we operate with geometries, the sf package gives a warning that attributes assumed to be constant over a geometry (which may be true or not).

st_centroid(ices_er) %>%
  head(1)

## Simple feature collection with 1 feature and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -28.04236 ymin: 69.68349 xmax: -28.04236 ymax: 69.68349
## Geodetic CRS:  WGS 84
## # A tibble: 1 × 4
##   ecoregion     area_km2                 geom  zone
##   <chr>            <dbl>          <POINT [°]> <dbl>
## 1 Greenland Sea     275. (-28.04236 69.68349)     1

We can specify the AGR like this:

st_agr(ices_er) # None of the variables have a set AGR.

## ecoregion  area_km2      zone 
##      <NA>      <NA>      <NA> 
## Levels: constant aggregate identity

st_agr(ices_er) <- c(ecoregion = "identity",
                     area_km2 = "aggregate",
                     zone = "constant")

st_agr(ices_er) # Now they do.

## ecoregion  area_km2      zone 
##  identity aggregate  constant 
## Levels: constant aggregate identity

ices_ct <- st_centroid(ices_er)

st_agr(ices_er)

## ecoregion  area_km2      zone 
##  identity aggregate  constant 
## Levels: constant aggregate identity

st_agr(ices_ct)

## ecoregion  area_km2      zone 
##  constant aggregate  constant 
## Levels: constant aggregate identity