In the R language there are two groups of classes to deal with vector data.
In this course we will focus mainly on the sf class, which have many advantages over the sp class. Still, it is useful to know about the sp class, in particular because may useful packages (still) use sp objects (and many packages may not be upgraded to accept sf objects).
A feature refers to any object or observation in the real world. Features have: a) a geometry, indicating where in Earth the feature is located, b) attributes, describing it properties.
In simple features, all geometric attributes are described by points and straight lines (no curves).
Simple features access is an international open standard by the Open Geospatial Consortium (OGC) and ISO (ISO 19125) that specifies a common storage and access model for two-dimensional geometries (point, lines, polygons).
The seven most common simple features:
Type | Description |
---|---|
POINT | Zero-dimensional geometry with a single point |
LINESTRING | Sequence of points connected by straight lines |
POLYGON | Sequence of points forming a closed ring |
MULTIPOINT | Set of points |
MULTILINESTRING | Set of linestrings |
MULTIPOLYGON | Set of polygons |
GEOMETRYCOLLECTION | Set of geometries of any type |
There are more simple features (CIRCULARSTRING, SURFACE, TRIANGLE, for a total of 18 features), but they are not supported by the sf package and we will not deal with them.
In a nutshell, sf objects are extensions of data.frames or tibbles. Each row is one feature, this is, one spatial object that could have associated data.
An sf objects may have one or more columns with data for each of the features, and a “special” column usually named “geometry” or “geom” (but can have any name) with the geometry of the feature (its type and coordinates).
The geometry is a list column of class sfc. It has a bounding box and a coordinate reference system as attributes, and a class attribute pointing out the common type (or GEOMETRY in case of a mix).
Each element in the geometry column is a single simple feature geometry is of class sfg (single feature geometry), with further classes pointing out dimension and type.
This all sounds complicated, but it is not. Let’s build a simple sf object from scratch. First, load the packages we will use.
library(tidyverse)
library(sf)
Now we will create an sf object with two point features.
# Create the geometry for individual features. In this case, two points.1
<- st_point(c(2, 3))
p1 <- st_point(c(1, 4))
p2
class(p1) # Class "sfg" and "POINT"
## [1] "XY" "POINT" "sfg"
# Now let's make a single feature geometry column
<- st_sfc(p1, p2)
pts class(pts)
## [1] "sfc_POINT" "sfc"
# Now it has a bounding box and a projection (CRS) pts
## Geometry set for 2 features
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1 ymin: 3 xmax: 2 ymax: 4
## CRS: NA
# Attach some data for the points and get an sf object
<- tibble(a = c (10, 20), b = c ("A", "B"))
mydata
<- st_as_sf(mydata, geometry = pts)
mysf class(mysf)
## [1] "sf" "data.frame"
mysf
## Simple feature collection with 2 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1 ymin: 3 xmax: 2 ymax: 4
## CRS: NA
## a b geometry
## 1 10 A POINT (2 3)
## 2 20 B POINT (1 4)
Here we used the st_point() function to convert a vector into an sfg object. To build sfg objects for other geometries, you can use the other available builder functions:
Function | Required input |
---|---|
st_point | numeric vector (or one-row matrix) |
st_multipoint | numeric matrix with points in row |
st_linestring | numeric matrix with points in row |
st_multilinestring | list with numeric matrices with points in rows |
st_polygon | list with numeric matrices with points in rows |
st_multipolygon | list of lists with numeric matrices |
st_geometrycollection | list with (non-geometrycollection) simple feature objects |
Rarelly you will need to build sf objects from scratch. More often you will have a data frame with coordinates and will want to convert that data into an sf object. We will do this, but first let’s learn a bit more about sf objects.
Let’s remember that data frame is also a list with vectors of the same length.
# This is a regular dataframe with three rows
<- data.frame(a = 7:9, b = c("a", "b", "c"), c = c (10, 20, 30))
my_df
is.data.frame(my_df)
## [1] TRUE
is.list(my_df)
## [1] TRUE
$a my_df
## [1] 7 8 9
1]] my_df[[
## [1] 7 8 9
A list column is simply a column in a dataframe that contains a list, with one element per row.
# This is a list with three elements
<- list(1:2, "Hello", c("A", "B", "C"))
my_list
# Now let's use the list as a column in the dataframe
$d <- my_list
my_dfprint(my_df)
## a b c d
## 1 7 a 10 1, 2
## 2 8 b 20 Hello
## 3 9 c 30 A, B, C
List columns are used in sf objects to store its geometry.
Let’s take a look at a “real” sf object, in this case the Bormicon areas used for the analysis of fisheries data in Icelandic waters.
<- read_sf("ftp://ftp.hafro.is/pub/data/shapes/bormicon.gpkg")
bor
class(bor)
## [1] "sf" "tbl_df" "tbl" "data.frame"
glimpse (bor)
## Rows: 16
## Columns: 5
## $ nafn <chr> "Vestur", "Vestfirdir", "Mid.Nordur", "Grunnslod.nordur", "Nordau…
## $ area <dbl> 50436.738, 31952.912, 22803.168, 10500.918, 16024.065, 30154.976,…
## $ nr <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
## $ name <chr> "W", "NW", "N_center", "N_shallow", "NE", "E", "E_ridge", "SE", "…
## $ geom <POLYGON [°]> POLYGON ((-24.41668 65.5, -..., POLYGON ((-20.97307 65.5, -..., P…
Note that the object bor is of class tibble (the “enhanced” data.frames from the tidyverse) and sf. In addition, sf objects can also be data frames.
When you examine the object, first you can see the metadata, including: - geometry type (in this case POLYGON) - dimension (XY for 2D data) - bounding box (minimum and maximum x and y values) - epsg and projstring: the coordinate reference system of the data (more on this later)
Next you can see the data, including columns with the subarea, division, and area in km2. This is what is know as attributes. There is also a an additional column usually named “geometry” or “geom” (although any name could be used) where the geometry (i.e. coordinates and topology) are stored.
We can extract the metadata elements like this:
st_geometry_type(bor)
## [1] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## [10] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
In this case the 16 features in this data set have the same geometry (POLYGON).
This gets the coordinate reference system:
st_crs(bor)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["unknown"],
## AREA["World"],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
This gets the bounding box:
st_bbox(bor)
## xmin ymin xmax ymax
## -40.0 58.0 -4.0 72.5
We can also use the usual functions in sf objects:
nrow(bor)
## [1] 16
ncol(bor)
## [1] 5
names(bor)
## [1] "nafn" "area" "nr" "name" "geom"
ftp://ftp.hafro.is/pub/data/shapes/helcom.gpkg ftp://ftp.hafro.is/pub/data/shapes/ospar.gpkg
For a quick look, we can use the plot() function. This uses R’s base graphs. Remember that R has at least three independent graphic systems: base graphs, ggplot, and lattice. In this course we will use some base graphs but we will focus on ggplot.
By default, plot() makes a map for each column.
plot(bor)
To only plot the geometry we can do the following:
plot(bor %>%
st_geometry())
<- st_geometry(bor)
bor_geom class(bor_geom)
## [1] "sfc_POLYGON" "sfc"
Here, we used the st_geometry() function to extract the geometry of the sf object, dropping the attributes (i.e. the “non spatial” data).
Let’s try the ggplot2 package with geom_sf().
%>%
bor ggplot() +
theme_bw() +
geom_sf(aes(fill = name)) +
labs(fill = "Area")
geom_sf() is an unusual geom because it will draw different geometric objects depending on what simple features are present in the data: you can get points, lines, or polygons.
In the bor object, the geometry type is a POLYGON, as individual reporting areas are continuous. Often continous areas have “holes” (for example islands). In these cases the direction of points (clockwise or anticlockwise) indicating the topology.
Very often the first step in our spatial analysis and mapping is to convert external objects into sf objects. This is done using the st_as_sf() function.
In the simplest case, our data is comprised of individual point observations, with associated data. In this case an sf object with geometry type POINT will suffice. Let’s get some data from the Icelandic bottom trawl survey:
<- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb.csv")
smb class(smb)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
<- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb.csv") %>%
smb st_as_sf(coords = c("lon1", "lat1"), crs = 4326)
smb
## Simple feature collection with 19846 features and 38 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -27.25017 ymin: 62.78067 xmax: -10.34 ymax: 67.33767
## Geodetic CRS: WGS 84
## # A tibble: 19,846 × 39
## id date year vid tow_id t1 t2
## * <dbl> <date> <dbl> <dbl> <chr> <dttm> <dttm>
## 1 44929 1990-03-16 1990 1307 670-15 1990-03-16 05:05:00 1990-03-16 06:05:00
## 2 44928 1990-03-16 1990 1307 670-13 1990-03-16 07:48:00 1990-03-16 08:51:00
## 3 44927 1990-03-16 1990 1307 670-12 1990-03-16 09:35:00 1990-03-16 10:49:00
## 4 44926 1990-03-16 1990 1307 670-1 1990-03-16 11:31:00 1990-03-16 12:31:00
## 5 44925 1990-03-16 1990 1307 670-2 1990-03-16 13:06:00 1990-03-16 14:05:00
## 6 44922 1990-03-16 1990 1307 720-1 1990-03-16 20:21:00 1990-03-16 21:20:00
## 7 44921 1990-03-16 1990 1307 720-11 1990-03-16 21:49:00 1990-03-16 22:50:00
## 8 44920 1990-03-17 1990 1307 720-13 1990-03-17 00:16:00 1990-03-17 01:20:00
## 9 44919 1990-03-17 1990 1307 720-12 1990-03-17 02:18:00 1990-03-17 03:20:00
## 10 44918 1990-03-17 1990 1307 670-14 1990-03-17 05:23:00 1990-03-17 06:25:00
## # … with 19,836 more rows, and 32 more variables: lon2 <dbl>, lat2 <dbl>,
## # ir <chr>, ir_lon <dbl>, ir_lat <dbl>, z1 <dbl>, z2 <dbl>, temp_s <dbl>,
## # temp_b <dbl>, speed <dbl>, duration <dbl>, towlength <dbl>,
## # horizontal <dbl>, verical <dbl>, wind <dbl>, wind_direction <dbl>,
## # bormicon <dbl>, oldstrata <dbl>, newstrata <dbl>, cod_kg <dbl>,
## # cod_n <dbl>, haddock_kg <dbl>, haddock_n <dbl>, saithe_kg <dbl>,
## # saithe_n <dbl>, wolffish_kg <dbl>, wolffish_n <dbl>, plaice_kg <dbl>, …
We created a POINT geometry with the starting and ending of each tow (lat1 and lon1) using the coords argument to pass the column names with the longitude (or x coordinate) first, and the latitude (or y coordinate) second (this is a common mistake). We also specified the coordinate reference system (4326 is the ESPG code for unprojected data using the WSG84 datum).
Note that the “lon1” and “lat1” columns have disappeared and instead we have the “geometry” column. If we want to keep a copy of the “lon1” and “lat1” columns we can use the argument remove=FALSE.
Another example. Here we have VMS data from some bottom trawls. The data contains a series of locations (latitude and longitude) during each trawl. If we do the same as in the smb dataset, we obtain an sf object with POINT geometry.
<- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb_vms2019.csv") %>%
vms st_as_sf(coords = c("lon", "lat"), crs = 4326)
vms
## Simple feature collection with 8918 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -27.25236 ymin: 63.01933 xmax: -10.55667 ymax: 67.32933
## Geodetic CRS: WGS 84
## # A tibble: 8,918 × 6
## vid vessel time speed heading geometry
## * <dbl> <chr> <dttm> <dbl> <dbl> <POINT [°]>
## 1 2350 Árni 2019-02-27 08:07:46 1.40 69.1 (-21.86028 64.15597)
## 2 2350 Árni 2019-02-27 08:07:46 1.40 69.1 (-21.86028 64.15597)
## 3 2350 Árni 2019-02-27 08:18:05 8.1 252. (-21.89164 64.16098)
## 4 2350 Árni 2019-02-27 08:28:25 3.70 240. (-21.92865 64.15326)
## 5 2350 Árni 2019-02-27 08:30:25 3.10 266. (-21.93287 64.15257)
## 6 2350 Árni 2019-02-28 13:12:35 7.50 43.4 (-21.92564 64.154)
## 7 2350 Árni 2019-02-28 13:12:35 7.50 43.4 (-21.92564 64.154)
## 8 2350 Árni 2019-02-28 13:22:36 10.5 334. (-21.94404 64.17966)
## 9 2350 Árni 2019-02-28 13:32:56 10.7 316. (-21.98714 64.20344)
## 10 2350 Árni 2019-02-28 13:45:06 10.7 315. (-22.04541 64.22924)
## # … with 8,908 more rows
ggplot() +
geom_sf(data=vms, alpha=0.25)
But this is not what we want. Rather than just points, we want to join the points from each haul to form a line. To do this we need to group the points by the ID column, and then use st_cast to convert the group of points into linestrings.
<- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb_vms2019.csv") %>%
vms st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
mutate(fishing = if_else(speed > 2 & speed < 4, 1, 0)) %>%
group_by(vid) %>%
summarise(do_union = FALSE) %>%
st_cast("LINESTRING")
vms
## Simple feature collection with 4 features and 1 field
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -27.25236 ymin: 63.01933 xmax: -10.55667 ymax: 67.32933
## Geodetic CRS: WGS 84
## # A tibble: 4 × 2
## vid geometry
## <dbl> <LINESTRING [°]>
## 1 1131 (-21.93494 64.15081, -21.93494 64.15081, -21.92787 64.15371, -21.92787 …
## 2 1277 (-13.91179 64.91615, -14.0081 64.92564, -13.99127 64.92077, -13.99127 6…
## 3 1281 (-18.88719 66.15852, -18.89815 66.15147, -18.88845 66.15773, -18.88845 …
## 4 2350 (-21.86028 64.15597, -21.86028 64.15597, -21.89164 64.16098, -21.92865 …
ggplot() +
geom_sf(data=vms, aes(colour = as.factor(vid))) +
theme(legend.position = "none")
The sdm object contains the latitud and longitude at the start and end of each haul. Let’s make a it a LINESTRING, with straight lines for each haul.
<- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb.csv")
smb
<- smb %>%
tracks filter(year == 2019) %>%
::select(id,
dplyrlat_1 = lat1,
lon_1 = lon1,
lat_2 = lat2,
lon_2 = lon2) %>%
pivot_longer(cols = -id) %>%
separate(name, sep = "_", into = c("info", "num")) %>%
pivot_wider(names_from = info) %>%
st_as_sf(coords = c ("lon", "lat"), crs = 4326) %>%
group_by(id) %>%
summarize(do_union = FALSE) %>%
st_cast("LINESTRING")
ggplot() +
geom_sf(data = tracks)
## Operating with attributes in sf objects
Remember that sf objects are data.frames or tibbles. So we can use the usual functions from base R or from the tidyverse to manipulate them.
Let’s load a set of polygons with the ICES ecoregions, and select the small ones.
<- read_sf("ftp://ftp.hafro.is/pub/data/shapes/ices_ecoregions.gpkg")
ices_er
ices_er
## Simple feature collection with 17 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -44 ymin: 30.26705 xmax: 68.5 ymax: 90
## Geodetic CRS: WGS 84
## # A tibble: 17 × 3
## ecoregion area_…¹ geom
## <chr> <dbl> <MULTIPOLYGON [°]>
## 1 Greenland Sea 275. (((-43.99177 60.29694, -…
## 2 Bay of Biscay and the Iberian Coast 83.3 (((-3.586876 43.49365, -…
## 3 Azores 82.7 (((-21.05878 36, -23.494…
## 4 Western Mediterranean Sea 88.5 (((-2.92761 35.27433, -2…
## 5 Ionian Sea and the Central Mediterranean S… 76.5 (((18.37672 39.79959, 18…
## 6 Black Sea 52.8 (((28.58799 43.80846, 28…
## 7 Adriatic Sea 15.3 (((19.99367 39.79384, 19…
## 8 Aegean-Levantine Sea 74.6 (((26.17555 40.04279, 26…
## 9 Celtic Seas 129. (((1.371231 62, 1.506343…
## 10 Baltic Sea 62.5 (((11.0789 53.14186, 11.…
## 11 Greater North Sea 97.2 (((-4.497232 48, -4.5583…
## 12 Arctic Ocean 797. (((68.5 82.97005, 68.058…
## 13 Icelandic Waters 142. (((-21.71718 63.84904, -…
## 14 Barents Sea 716. (((12.7688 67.75756, 12.…
## 15 Faroes 46.7 (((-0.489327 63.89084, -…
## 16 Norwegian Sea 300. (((15.12989 68.3119, 15.…
## 17 Oceanic Northeast Atlantic 599. (((-13.9097 60.82639, -1…
## # … with abbreviated variable name ¹area_km2
ggplot() +
geom_sf(data = ices_er, aes(fill=ecoregion))
<- ices_er %>%
sm_ices_er filter(area_km2 < 100)
ggplot() +
geom_sf(data = sm_ices_er, aes(fill=ecoregion))
Notice that the new object sm_ices_er also has a geometry column even if we did not selected it explicitly. In sf objects the geometry is “sticky”. In other words, subsets of sf objects are also sf objects. If we want to remove the geometry column we can do this:
<- st_drop_geometry(sm_ices_er) sm_ices_er_df
Also, note how the metadata of the sf object gets updated. Look at the bounding box of each object:
st_bbox(ices_er)
## xmin ymin xmax ymax
## -44.00000 30.26705 68.50000 90.00000
st_bbox(sm_ices_er)
## xmin ymin xmax ymax
## -35.57930 30.26705 41.78158 67.08025
The Spatial class includes several derived classes to represent different types of spatial data (with and without associated data). For vector data, there are classes for points, lines and polygons:
sp also has (limited) support for raster data:
We will not deal with the spatial class, except to say that we can convert to and from sf.
library(tidyverse)
library(sf)
library(sp)
data(meuse)
class(meuse)
## [1] "data.frame"
coordinates (meuse) = ~ x + y # Convert to class sp
<- meuse %>% st_as_sf() # From spatial to sf
meuse.sf
<- meuse %>% as("Spatial") # From sf to spatial meuse.sp