library(tidyverse)
library(sf)
The package sf provides a large set of functions to query properties of sf objects and to combine sf objects to form new geometries.
Geometric operations (and the sf functions that implement them) can be classified as:
Operations can also be classified as unary, binary, or n-ary, depending if they need one, two or more sf objects.
sf has several functions to extract basic information from objects of class sf:
<-
nfu read_sf("ftp://ftp.hafro.is/pub/data/shapes/nephrops_fu.gpkg") %>%
st_transform(3395) # Transform to World Mercator
st_is_simple(nfu)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
st_is_valid(nfu)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
st_is_empty(nfu)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE
st_is_longlat(nfu)
## [1] FALSE
st_dimension(nfu) # 0=points, 1=linear, 2=polygon
## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
st_geometry_type(nfu)
## [1] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## [10] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## [19] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## [28] POLYGON POLYGON POLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
st_crs(nfu) # Query (or set) the coordinate reference system
## Coordinate Reference System:
## User input: EPSG:3395
## wkt:
## PROJCRS["WGS 84 / World Mercator",
## BASEGEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["World Mercator",
## METHOD["Mercator (variant A)",
## ID["EPSG",9804]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["World - between 80°S and 84°N"],
## BBOX[-80,-180,84,180]],
## ID["EPSG",3395]]
st_bbox(nfu) # Bounding box
## xmin ymin xmax ymax
## -1669792 4344223 1447153 8821377
The function st_sample() can be used to sample locations from polygons or linestrings. The result is an sfc object with the sampled POINT geometries.
<-
nfu read_sf("ftp://ftp.hafro.is/pub/data/shapes/nephrops_fu.gpkg") %>%
st_transform(3395) # Transform to World Mercator
set.seed(100)
<- st_sample(nfu, size = 100, type = "random")
pts1
<- st_cast(nfu, "LINESTRING")
nfu_lines <- st_sample(nfu_lines, size = 100, type = "random")
pts2
ggplot() +
geom_sf(data = nfu) +
geom_sf(data = pts1, color = "red") +
geom_sf(data = pts2, color = "blue")
The mean position of geometries (LINESTRING OR POLYGONS) can be obtained by the st_centroid() function.
<- st_centroid(nfu)
centr <- st_point_on_surface(nfu) # Not exactly the centroid, but always in the polygon
centr2
ggplot() +
geom_sf(data = nfu) +
geom_sf(data = centr, color = "red") +
geom_sf(data = centr2, color = "blue")
The function st_buffer() can be used for points, linestrings or polygons.
<- st_buffer(pts1, dist = 100000) # 100 km buffer
buf ggplot() +
geom_sf(data = buf, linetype = "dashed") +
geom_sf(data = pts1)
<- st_buffer(nfu_lines, dist = 50000)
buf ggplot() +
geom_sf(data = buf, aes(fill = name))
<- st_buffer(nfu, dist = 20000)
buf ggplot() +
geom_sf(data = buf, aes(fill = name))
# Negative buffers are allowed for polygons!
<- st_buffer(nfu, dist = -20000)
buf ggplot() +
geom_sf(data = buf, aes(fill = name))
A convex hull is the smallest convex polygon that includes a group of points. We can use the st_convex_hull() function to get it, but first we need to join the POINT geometries into a single MULTIPOINT geometry. Otherwise we get a “convex hull” for each individual point.
<- st_union(pts1)
pts1_u
<- st_convex_hull(pts1_u)
chull
ggplot() +
geom_sf(data = chull) +
geom_sf(data = pts1)
st_union() can be used to simplify features. For polygons, it will combine multiple polygons into a single polygon. For points, it clusters individual points into a MULTIPOINT geometry.
Concave hulls are similar to convex hull, but they are concave (duh!). To compute a concave hull we need the package concaveman Note that the concaveman takes an sf object, but it does not accept an sfc object. (i.e. a geometry)
library(concaveman)
<- concaveman(st_as_sf(pts1), concavity = 1)
conc1 <- concaveman(st_as_sf(pts1), concavity = 2.5)
conc2
ggplot() +
geom_sf(data = conc1, color = "red", fill= NA) +
geom_sf(data = conc2, color = "blue", fill = NA) +
geom_sf(data = pts1)
With high values for the concavity parameter we get a convex
hull.
The function st_grid provides rectangular or hexagonal grids. You can get polygons, nodes or centroids.
<- st_make_grid(nfu, n = c(10, 15))
sq_grid
<- st_make_grid(nfu, n = c(10, 15), square = FALSE)
hex_grid
ggplot() +
geom_sf(data = sq_grid, fill = NA, color = "gray") +
geom_sf(data = hex_grid, fill = NA, color = "red") +
theme_bw()
Sometimes we want to make a polygon of the bounding box of a geometry.
For this, we can make a grid with a single cell.
<- st_make_grid(nfu, n = 1) box
The st_graticule() can be used to create a set of lines with constant latitude or longitude, which can be added to maps as reference.
Sometimes it is necessary to simplify vector data, reducing the memory, disk space and bandwidth they require, and speeding up geometrical operations. For this we use the st_simplify() function.
<-
helcom read_sf("ftp://ftp.hafro.is/pub/data/shapes/helcom.gpkg") %>%
st_transform(3395)
<- st_simplify(helcom, dTolerance = 5000)
helcom_simple
object.size(helcom)
## 1501112 bytes
object.size(helcom_simple)
## 90888 bytes
ggplot() + geom_sf(data = helcom)
ggplot() + geom_sf(data = helcom_simple)
The sf package provides a series of unary measures that return a single values describing some properties of geometries.
st_area(nfu)
## Units: [m^2]
## [1] 12097265284 70529205689 101515726451 43665187421 52789217861
## [6] 83897170656 182225854980 41084273347 109930365725 77625018558
## [11] 39711206899 81904483984 53266972071 25500316209 33494096919
## [16] 89343995677 46822981830 15399272037 224243874610 30901019019
## [21] 34189728313 587794324640 65924335674 67677150967 101975539823
## [26] 104801209635 75669588860 239994539247 33245051334 58274175242
st_length(nfu) # Polygons have no length!
## Units: [m]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
st_length(nfu_lines)
## Units: [m]
## [1] 439982.1 1078852.6 1275871.1 837529.0 1201254.6 1423893.6 1797728.5
## [8] 814344.3 1832675.5 1366190.1 802009.8 1158423.9 1144988.3 751134.1
## [15] 746160.6 1462641.2 865896.0 583612.0 2390085.9 722866.5 1044121.7
## [22] 3530082.2 1037486.3 1053232.2 1484089.6 1454082.0 1220079.3 2186042.3
## [29] 842611.4 1087293.5
The function st_distance() returns a dense numeric matrix with distances between geometries.
Let’s get a few points, lines and polygons:
set.seed(100)
<- nfu[1:3, ]
poly
<- st_sample(poly, 5) %>%
pts_a st_as_sf()%>%
mutate(lab = 1:5)
<- st_sample(poly, 2)
pts_b
st_distance(pts_a, pts_b)
## Units: [m]
## [,1] [,2]
## [1,] 390038.79 215873.96
## [2,] 271325.80 132231.48
## [3,] 83469.62 91539.58
## [4,] 413750.94 253766.06
## [5,] 313926.48 141064.78
st_distance(pts_a, poly)
## Units: [m]
## [,1] [,2] [,3]
## [1,] 514192.8 99476.91 0.0
## [2,] 357723.2 13305.68 0.0
## [3,] 227285.5 0.00 184031.4
## [4,] 503294.0 154725.62 0.0
## [5,] 430993.2 41763.01 0.0
ggplot() +
geom_sf(data = poly, aes(fill = name)) +
geom_sf(data = pts_a) +
geom_sf_text(data = pts_a,
nudge_x = 20000,
aes(label = lab))
The function returns a distance matrix with all pairwise distances (but
see the by_element argument). The matrix is of class
units, for numerical data with their associated measurement
unit.
Note that we get a distance of 0 when the point is inside the polygon.
Aggregation involves summarising a dataset by a grouping variables, usually one of the attribute columns. The sf package provides methods for stats::aggregate and dplyr::summarise. When used, they:
Let’s look again to our small vms dataset:
<-
vms read_csv("ftp://ftp.hafro.is/pub/data/csv/small_vms.csv") %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326)
# It is geometry type POINT. vms
## Simple feature collection with 1070 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -22.38448 ymin: 65.83757 xmax: -11.57953 ymax: 67.18746
## Geodetic CRS: WGS 84
## # A tibble: 1,070 × 2
## id geometry
## * <dbl> <POINT [°]>
## 1 1 (-11.77807 66.02346)
## 2 1 (-11.7601 66.01879)
## 3 1 (-11.74409 66.01425)
## 4 1 (-11.72632 66.00922)
## 5 1 (-11.70703 66.00563)
## 6 1 (-11.7107 66.00186)
## 7 1 (-11.74066 66.00265)
## 8 1 (-11.76391 66.00338)
## 9 1 (-11.78793 66.0038)
## 10 1 (-11.81134 66.00404)
## # … with 1,060 more rows
<- vms %>%
vms_gr group_by(id) %>%
summarise(n = n())
# It is geometry MULTIPOINT! vms_gr
## Simple feature collection with 48 features and 2 fields
## Geometry type: MULTIPOINT
## Dimension: XY
## Bounding box: xmin: -22.38448 ymin: 65.83757 xmax: -11.57953 ymax: 67.18746
## Geodetic CRS: WGS 84
## # A tibble: 48 × 3
## id n geometry
## <dbl> <int> <MULTIPOINT [°]>
## 1 1 18 ((-11.90825 65.99555), (-11.86019 66.00352), (-11.88329 66.0046)…
## 2 2 27 ((-11.8924 66.05173), (-11.69867 66.02163), (-11.7207 66.0165), …
## 3 3 28 ((-11.59924 65.86496), (-11.61083 65.87383), (-11.59346 65.8715)…
## 4 4 24 ((-11.65764 65.97255), (-11.62297 65.92655), (-11.6206 65.9168),…
## 5 5 30 ((-11.57953 65.90664), (-11.58784 65.91653), (-11.59357 65.92663…
## 6 6 27 ((-11.59597 65.98899), (-11.59343 65.97939), (-11.59295 65.96986…
## 7 7 16 ((-11.95675 66.15033), (-11.94637 66.14257), (-11.93504 66.12985…
## 8 8 32 ((-11.96925 66.15286), (-11.95733 66.14282), (-11.96078 66.13886…
## 9 9 21 ((-12.23553 66.13731), (-12.24042 66.14221), (-12.26674 66.14252…
## 10 10 26 ((-12.436 66.10825), (-12.42128 66.11192), (-12.43039 66.11488),…
## # … with 38 more rows
<- vms %>%
vms_tr group_by(id) %>%
summarise(n = n()) %>%
st_cast("LINESTRING")
# It is now a LINESTRING. vms_tr
## Simple feature collection with 48 features and 2 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -22.38448 ymin: 65.83757 xmax: -11.57953 ymax: 67.18746
## Geodetic CRS: WGS 84
## # A tibble: 48 × 3
## id n geometry
## <dbl> <int> <LINESTRING [°]>
## 1 1 18 (-11.90825 65.99555, -11.86019 66.00352, -11.88329 66.0046, -11.…
## 2 2 27 (-11.8924 66.05173, -11.69867 66.02163, -11.7207 66.0165, -11.74…
## 3 3 28 (-11.59924 65.86496, -11.61083 65.87383, -11.59346 65.8715, -11.…
## 4 4 24 (-11.65764 65.97255, -11.62297 65.92655, -11.6206 65.9168, -11.6…
## 5 5 30 (-11.57953 65.90664, -11.58784 65.91653, -11.59357 65.92663, -11…
## 6 6 27 (-11.59597 65.98899, -11.59343 65.97939, -11.59295 65.96986, -11…
## 7 7 16 (-11.95675 66.15033, -11.94637 66.14257, -11.93504 66.12985, -11…
## 8 8 32 (-11.96925 66.15286, -11.95733 66.14282, -11.96078 66.13886, -11…
## 9 9 21 (-12.23553 66.13731, -12.24042 66.14221, -12.26674 66.14252, -12…
## 10 10 26 (-12.436 66.10825, -12.42128 66.11192, -12.43039 66.11488, -12.3…
## # … with 38 more rows
Note that the order of the points in the LINESTRING changed. This is because by default the geometries are combined using st_union() In this case though it is better to use st_combine(), by adding do_union = FALSE*.
plot(vms_tr %>% st_geometry())
<- vms %>%
vms_tr group_by(id) %>%
summarise(n = n(),
do_union = FALSE) %>%
st_cast("LINESTRING")
plot(vms_tr %>% st_geometry())
Let’s aggregate the ICES ecoregions into our three groups:
<-
ices_er read_sf("ftp://ftp.hafro.is/pub/data/shapes/ices_ecoregions.gpkg") %>%
mutate(zone = c(1, 2, 2, 3, 3, 3, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 2)) %>% # Add a grouping variable: 1=Arctic & subarctic, 2=North Atlantic, 3=Mediterranean & Black Sea
# the original object is not valid:
st_make_valid()
<-
ices_er_zones %>%
ices_er group_by(zone) %>%
summarize(area = sum(area_km2))
ggplot() +
geom_sf(data = ices_er_zones,
aes(fill = area))
When aggregating polygons, it is better to allow the default do_union=TRUE to dissolve internal boundaries. Otherwise we get invalid polygons (and internal divisions in our plots).
We can use rbind to join spatial objects, but the column names and coordinate reference system need to be the same.
<- ices_er %>%
sm_ices_er filter(area_km2 < 100)
<- ices_er %>%
lg_ices_er filter(area_km2 >= 100)
<- rbind(sm_ices_er, lg_ices_er) all_ices_er
Most sf objects contain attributes and geometry. Because geometric operations do not change existing attributes, we need to be careful about if the attributes make sense in terms of the new geometries.
Example:
We need to distinguish between three attributegeometry relationships (AGR)
When we operate with geometries, the sf package gives a warning that attributes assumed to be constant over a geometry (which may be true or not).
st_centroid(ices_er) %>%
head(1)
## Simple feature collection with 1 feature and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -28.04236 ymin: 69.68349 xmax: -28.04236 ymax: 69.68349
## Geodetic CRS: WGS 84
## # A tibble: 1 × 4
## ecoregion area_km2 geom zone
## <chr> <dbl> <POINT [°]> <dbl>
## 1 Greenland Sea 275. (-28.04236 69.68349) 1
We can specify the AGR like this:
st_agr(ices_er) # None of the variables have a set AGR.
## ecoregion area_km2 zone
## <NA> <NA> <NA>
## Levels: constant aggregate identity
st_agr(ices_er) <- c(ecoregion = "identity",
area_km2 = "aggregate",
zone = "constant")
st_agr(ices_er) # Now they do.
## ecoregion area_km2 zone
## identity aggregate constant
## Levels: constant aggregate identity
<- st_centroid(ices_er)
ices_ct
st_agr(ices_er)
## ecoregion area_km2 zone
## identity aggregate constant
## Levels: constant aggregate identity
st_agr(ices_ct)
## ecoregion area_km2 zone
## constant aggregate constant
## Levels: constant aggregate identity