Spatial data is stored in files and in spatial databases. In this course we will focus on reading files.
There is a large array of file formats for spatial data, both vector and raster data. To get an idea, we can do this:
library(tidyverse)
library(sf)
<- st_drivers("vector") %>% select(long_name)
drivers1 nrow(drivers1)
## [1] 89
<- st_drivers("raster") %>% select(long_name)
drivers2 nrow(drivers2)
## [1] 146
What we just saw are the file formats that can be read with GDAL library.
The Geospatial Data Abstraction Library (GDAL) is an external library for reading and writing raster and vector geospatial data in many (>220) formats and databases.
A full list of vector data drivers: https://gdal.org/drivers/vector/index.html
and of raster data drivers: https://gdal.org/drivers/raster/index.html
GDAL is the workhorse for reading and writing spatial data, used by many programs including ArcGis and QGIS.
In R, the rgdal package provides the functions readOGR() and writeOGRL(), and readGDAL() and writeGDAL() to read and write vector and raster data, respectively. But rgdal uses objects of class sp.
To reading and writing to and from sf objects, we use the functions st_read() and st_write() from the sf package. These functions link directly to the GDAL library, allowing us to read any of the formats listed using the st_drivers() function.
GDAL also includes a large array of programs to manipulate and process vector and raster data.
https://gdal.org/programs/index.html
You can use GDAL programs from R using the gdalUtils package. Several of the functions of the sf packages (and others) use GDAL functions “under the hood”.
One of the most commonly used is the shapefile, developed by ESRI (the makers of ArcGis). A lot of the vector data you will find online is in this format. But it has limitations (see http://switchfromshapefile.org/).
They can hold either points, lines or polygons.
A shapefile is not a single file, but several files with the same name, different extensions, in the same folder. There are three mandatory files:
.shp shape format (geometric entities) .shx index of the geometric entities .dbf attributes (data)
Other files usually include a .prj file, with the coordinate system.
Shapefile is not an open format.
Let’s take a look at some shapefiles. This bit of code creates the “data_raw” folder (in case it does not exist), downloads a zip file with two shapefiles, extracts the shapefiles from the zip file, and finally deletes the zip file.
dir.create("./data_raw")
<- "./data_raw/Iceland_shapefiles.zip"
target.file
download.file(url = "ftp://ftp.hafro.is/pub/data/shapes/Iceland_shapefiles.zip",
destfile = target.file)
unzip(target.file, overwrite = TRUE,
exdir = "./data_raw/")
file.remove(target.file) # Delete the zip file
## [1] TRUE
Examine the “./data_raw/Iceland_shapefiles” folder. You will see two set of files with the same name and different extensions. These are two shapefiles with the coastline and depth contours for Iceland.
To read shapefiles, we use either the read_sf() (which has nicer defaults) or st_read() functions.
# We can point directly to the *.shp file.
<- read_sf("./data_raw/Iceland_shapes/Iceland_coast.shp")
coast
# Or we can use the dsn/layer format (note that we do not use the .shp ending)
<- read_sf(dsn = "./data_raw/Iceland_shapes/", "Iceland_coast")
coast
glimpse(coast)
## Rows: 1
## Columns: 3
## $ ID_0 <chr> "104"
## $ NAME_0 <chr> "Iceland"
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((204607 -452...
A geopackage is a new format for vector data that uses a single file (usually .gpkg). It was developed as an alternative to shapefiles by the Open Geospatial Consortium. It is an open format, supported by most GIS software.
They can store both vector and raster data, with attributes.
We have been reading geopackage files already. :)
To write spatial data into a file, we can use the write_sf() or the st_write() functions. These functions can guess the driver needed based on the extension of the target file.
write_sf(coast, "./data_raw/my_shapefile.shp")
write_sf(coast, "./data_raw/my_geopackage.gpkg")
If you use st_write() to save an sf objects with points, by default the coordinates will not be included.
You can include the coordinates (which is usually what we want) by doing
<- data.frame(lon = c(-11.77, -11.76, -11.744),
mypts lat = c(66.02, 66.018, 66.014)) %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326)
dir.create("./data")
write_sf(mypts, "./data/pts.csv", layer_options = "GEOMETRY=AS_XY", update=TRUE)
This passes an argument to the GDAL function doing the writting.