What would be the geometry and attributes of data collected during a
bottom trawl?
Typically when we think about analysing spatial data we think about Geographic Information Systems (GIS).
GIS are specialised sofware for the management, analysis and display of spatial data. The big player here is ArcGIS, which is widely used, but is proprietary (and very costly): http://www.esri.com/arcgis/about-arcgis.
Now, thanks to free and open source software for geospatial (FOSS4G) analysis we can have easy access to powerful tools. For example, there are excellent open source GIS, including QGIS (http://www.qgis.org/en/site/) and GRASS (https://grass.osgeo.org/).
In general, GIS are used through graphical interfaces (GUIs), which discourages reproducibility (although GIS functions can be accessed from the command line.)
In recent years, R has been gaining strength as a full-fledge tool for the analysis of spatial data: - Development of classes for spatial data - Inteaction with GIS (Geographic Information System). - Interaccion with geographical databases.
Now R has much of the functionality that was available previously only on GIS software: - Read, manipulate and save spatial data. - Perform operations with spatial data like a GIS: selections, overlays, clipping and making buffers. - Sophisticated statistical methods (e.g. geographically weighted regression, spatial interaction models) can be built around your spatial data.
Spatial phenomena can be thought in two basic ways:
As discrete objects or events with clear boundaries. Examples: locations of cities, rivers, national boundaries.
As continuous properties without clear boundaries. Examples: bottom depth, temperature.
These two types take us to two distinct data models:
Raster data is used to represent parameters that vary continuously in space.
In the most common case, a raster represents some area as a regular grid of equally sized rectangles, known as cells or pixels (cell widths in the x and y direction can be different). Each cell can hold one or more data values (i.e. single-layer and multi-layer rasters). Data can be continuous (e.g. depth, temperature) or discrete/categorical (e.g. land types)
Each cell has an individual ID. The raster does not stores the coordinates (corners) of individual cells. Instead, is implicitly set by knowing the spatial extent of the raster (i.e. its size), the resolution (cell size), and the origin (the coordinates of the first corner).
Although here we will focus on rasters as regular grids, the raster model is more flexible. We can have rasters with rotated, sheared, irregular and curvilinear grids.
Continuous data can be stored in other ways, for example in triangulated irregular networks (TINs). We will not deal with these in this course.
As we know, the capabilities of R are extended through the use of packages. In R there are two main groups of ecosystems of packages to deal with spatial data:
In the old days (early 2000’s) several R had several packages for the analysis of spatial data, including akima for interpolation, and spatstat for point pattern analysis. R did not have a way to tell coordinates from other numbers, and these packages did not have a uniform way to represent spatial data.
For example the geoR package defined the class geodata, which is a list with objects named coords and data, while the gstat package defined the class gstat, a list with objects named data, model and set. Things were messy.
In 2005, the sp package was created to unify the treatment of spatial data in R and to
In addition to sp, the package rgdal (released in 2003) was used to read/write spatial data (using the powerful GDAL libraries, https://gdal.org/), and the package rgeos (released in 2010) was used to perform spatial operations (using the GEOS library, https://trac.osgeo.org/geos). Transformation of geographic projections are done use the external library PROJ (https://proj.org/).
In parallel, the package raster was released in 2010 to provide better support for raster data.
The sp/rgdal/rgeos/raster ecosystem made R a powerful tool for spatial analysis. Today more than 500 R packages rely on sp.
When sp was written there was no standard for simple features. Shapefiles was the dominant format for exchanging vector data, but there was no standard, which brought all kinds of problems.
The simple feature access standard is now widely accepted, but sp has to do conversions to load it.
The external libraries used by R to read/write spatial data (GDAL) and for geometrical operations (GEOS) have developed a strong support for the simple feature standard.
The tidyverse cluster of packages (which includes dplyr and ggplot2) does not work well with spatial classes of sp.
Recently a new package ecosystem was developed for the analysis of spatial data in R. It is centred around the sf package (sf = “simple features”), first released in 2016.
The sf package provides a new class for vector spatial data, essentially replacing sp.
It also has functions to read/write data, and to do operations on vector data. So it also replaces the rgdal and rgeos packages.
The sf package has a more modern data structure (more on this later), and also works well with packages of the “tidyverse”.
The very new stars package (released in 2018) provides datacubes. Datacubes are dense arrays used to store spatio-temporal data. Space (2D or 3D) and time are the array dimensions. stars is not a direct replacement for rasters, but it provides much of its functionality.
https://r-spatial.github.io/stars/
In early development, the package terra is intended to be a replacement of raster https://github.com/rspatial/terra
There are many more. For an overview look at the CRAN Task view on analysis of spatial data: