Datasets - overview

The data used as examples in the course are all stored at this location ftp://ftp.hafro.is/pub/data. To obtain the path for reading any file into R it is simplest to:

Go to the site using a browser
Find the file of interest
Right click on the filename and copy linklocation
Paste the path into our R-script

The content of the directory is as follows:

fs::dir_tree("/home/ftp/pub/data/csv")

## /home/ftp/pub/data/csv
## ├── burdarthol.csv
## ├── datras_2018_haul.csv
## ├── datras_2018_length.csv
## ├── flotinnosigrandi.csv
## ├── flotinnosigrandi.rds
## ├── grunnpunktar.csv
## ├── ice_rgl_grunnpunktar.csv
## ├── iessns2019_strata.csv
## ├── iessns2019_tows.csv
## ├── iessns2019_trail.csv
## ├── is_smb.csv
## ├── is_smb_biological.csv
## ├── is_smb_cod_rbya.csv
## ├── is_smb_stations.csv
## ├── is_smb_vms2019.csv
## ├── is_survey-stations.csv
## ├── is_survey-tracks.csv
## ├── midlinur_eez.csv
## ├── minke.csv
## ├── small_vms.csv
## ├── smb_by_reit.csv
## └── vidmidunarpunktar.csv

/net/www/export/home/ftp/pub/data [error opening dir]

0 directories, 0 files

Data

Minke

The minke whale dataset contains biological measurements from 192 scientific catches of minke whales between the year 2003 and 2007. The variables are as follows:

whale.id: Unique identifier for the whale
date.caught: the date when the whales was caught
lat: latitude
lon: longitude
area: Derived from location (North/South)
length: length of the whale
weight: weight of the whale
age: age of the whale
sex: Male or Female
maturity: maturity status of the whale
stomach.volume: volume (in liters) of the stomach content
stomach.weight: weight of the stomach content
year: the year when the whale was caught

Importing the data into R:

minke <- 
  read.csv("ftp://ftp.hafro.is/pub/data/csv/minke.csv",
           stringsAsFactors = FALSE) %>% 
  as_tibble()

Icelandic bottom trawl spring survey (SMB)

The smb dataset contains 19846 tows from the annual Icelandic bottom trawl spring survey from years 1985 to 2019. It is in a long format, each row representing a single tow. The variables are as follows:

id: Unique station identification code
date: The date
vid: Vessel identification number
tow_id: Tow identification number
t1: Time of tow start (shoot time)
t2: Time of tow end (haul time)
lon1: Longitude of tow start, decimal degrees
lat1: Latititue of tow start, decimal degrees
lon2: Longitude of tow end, decimal degrees
lat2: Latitute of tow end, decimal degrees
ir: ICES statistical rectangle
ir_lon ICES statistical rectangle longitude midpoint
ir_lat ICES statistical rectangle latitude midpoint
z1: Bottom depth at tow start
z2: Bottom depth at tow end
speed: Average towing speed in knots
duation: Duration of tow in minutes
towlength: Length of tow in nautical miles
horizontal: Horizontal net opening in meters
vertical: Vertical net opening in meters
wind: Wind strength, Beufort scale
wind_direction: Wind directions in degrees
bormicon: Bormicon identification number (see Bormicon shapefile below)
oldstrata: Old stratification identification number
newstrata: New stratification identification number
__**_kg__: Species biomass in kilograms, standardized to 4 nautical miles
__**_n__: Species abundance, standardize to 4 nautical miles

Importing the data into R:

d <- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb.csv")

SMB 2019 vessel tracks

The dataset are 8918 ais/vms records of four vessels that participated in the Icelandic spring survey in 2019. The variables are as follows:

vid: Vessel identification number
vessel: Vessel name
time: Time in UTC
lon: Longitude in decimal degrees
lat: Latitude in decimal degrees
speed: Vessel speed in knots
heading: Vessel headings in degrees

Importing the data into R:

d <- 
  read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb_vms2019.csv")

Shapes

Bormicon

check_last_point <- function(df) {
  
  groups <- df %>% pull(group) %>% unique()
  
  res <- list()
  
  for(i in 1:length(groups)) {
    
    x <- df %>% filter(group == groups[i])
    
    if(x$lat[1] != x$lat[nrow(x)]) {
      res[[i]] <-
        tibble(lat = c(x$lat, x$lat[1]),
               lon = c(x$lon, x$lon[1]),
               group = c(x$group, x$group[1]))
    } else {
      res[[i]] <- x
    }
  }
  bind_rows(res)
}
d <- 
  tibble(nafn = attributes(fjolst::reg.bc)$names,
         area = attributes(fjolst::reg.bc)$area) %>% 
  mutate(nr = 1:n(),
         name = c("W", "NW", "N_center", "N_shallow",
                  "NE", "E", "E_ridge", "SE",
                  "S_SE", "S_SW", "NW_deep", "NE_deep",
                  "S_deep", "W_ridge", "W_deep", "N_deep"))

bormicon <-
  fjolst::reg.bc %>%
  bind_rows(.id = "group") %>%
  check_last_point() %>%        # see function below
  sf::st_as_sf(coords = c("lon","lat")) %>%
  sf::st_set_crs(4326) %>%
  group_by(group) %>%
  summarise(do_union = FALSE) %>%
  sf::st_cast("POLYGON")  %>%  # turn MULTIPOINT TO POLYGON
  rename(nafn = group) %>% 
  left_join(d) %>% 
  arrange(nr)
st_write(bormicon, "/net/www/export/home/ftp/pub/data/shapes/bormicon.gpkg")

ICES shapes

… create an interactive map showing all shapes

ICES shapes code

The following shapes were obtained from gis.ices.dk. The primary purpose for storing them here was:

Storing each as a single files in “gpkg”-format rather numerous ESRI-shapefiles that are bundled within each zip-file.
Check shape validity.
Create consistent variable names.
Simplifying some very dense files in order to reduce usage of memory space when read into R.

The code shows how the data were obtained, and in case of simplification how they were simplified.

A little helper function:

read_zipped_shapes <- function(url, simplify = FALSE, make_valid = TRUE) {
  td <- tempdir()
  tf <- tempfile()
  download.file(url, tf)
  fil <- unzip(tf, exdir = td)
  fil <- fil[grep(".shp$", fil)]
  if(length(fil) == 1) {
    sp <- 
      sf::read_sf(fil)
    #if(url %in% c("http://gis.ices.dk/shapefiles/ICES_ecoregions.zip",
    #              "ftp://ftp.hafro.is/pub/data/shapes/ices_areas.gpkg")) {
    #  bb <- sf::st_bbox(sp)
    #  bb[[4]] <- 89
    #  sp <- sf::st_crop(sp, bb)
    #}
  } else {
    res <- map(fil, sf::read_sf)
    names(res) <- 
      basename(fil) %>% 
      stringr::str_remove(".shp")
    sp <- 
      data.table::rbindlist(res,
                            use.names = TRUE,
                            fill = TRUE,
                            idcol = "name") %>%
      sf::st_as_sf()
  }
  
  sp <-
    sp %>% 
    dplyr::rename_all(tolower)
  file.remove(fil)
  
  if(!all(sf::st_is_valid(sp)) & make_valid) {
    sp <- sp %>% lwgeom::st_make_valid()
  }
  
  if(simplify) sp <- rmapshaper::ms_simplify(sp)
  
  if(is.na(sf::st_crs(sp)$epsg)) {
    # assume it is 4326
    sp <- sp %>% sf::st_set_crs(value = 4326)
  } else {
    if(sf::st_crs(sp)$epsg != 4326) sp <- sp %>% sf::st_transform(crs = 4326)
  }
  
  return(sp)
  
}

Nephrops functional units

reading:

url <- "http://gis.ices.dk/shapefiles/Nephrops_FU.zip"
p <- 
  read_zipped_shapes(url) %>%
  select(name = fu_descrip, fu)
write_sf(p, "/net/www/export/home/ftp/pub/data/shapes/nephrops_fu.gpkg")

view:

ICES areas

url <- "http://gis.ices.dk/shapefiles/ICES_areas.zip"
p <-
  read_zipped_shapes(url, simplify = TRUE)
write_sf(p, "/net/www/export/home/ftp/pub/data/shapes/ices_areas.gpkg")

view:

ICES ecoregions

url <- "http://gis.ices.dk/shapefiles/ICES_ecoregions.zip"
p <- 
  read_zipped_shapes(url, simplify = TRUE) %>% 
  select(ecoregion,
         area_km2 = shape_area)
p %>% write_sf("/net/www/export/home/ftp/pub/data/shapes/ices_ecoregions.gpkg")

view:

OSPAR (from ICES)

url <- "http://gis.ices.dk/shapefiles/OSPAR_Subregions.zip"
p <- read_zipped_shapes(url) 
p %>% write_sf("/net/www/export/home/ftp/pub/data/shapes/ospar.gpkg")

view:

HELOM (from ICES)

url <- "http://gis.ices.dk/shapefiles/HELCOM_subbasins.zip"
p <- read_zipped_shapes(url, simplify = TRUE)
p %>% write_sf("/net/www/export/home/ftp/pub/data/shapes/helcom.gpkg")

view:

ICES statistical rectangles

url <- "http://gis.ices.dk/shapefiles/ICES_rectangles.zip" 
p <- read_zipped_shapes(url)
write_sf(p, "/net/www/export/home/ftp/pub/data/shapes/ices_rectangles.gpkg")

rect <- read_sf("ftp://ftp.hafro.is/pub/data/shapes/ices_rectangles.gpkg")

ICES statistical subrectangles

url <- "http://gis.ices.dk/shapefiles/ICES_SubStatrec.zip" 
p <- read_zipped_shapes(url)
p %>% 
  write_sf("/net/www/export/home/ftp/pub/data/shapes/ices_subrectangles.gpkg")

Appendix - data generation

What follows is just bookeeping the source of the data as well as documentation of any data manipulation done.

Icelandic bottom trawl spring survey (SMB)

attach("/u2/reikn/R/SurveyWork/SMB/catchperstation.rdata")
library(mar)
con <- connect_mar()
st <- 
  lesa_stodvar(con) %>% 
  filter(synaflokkur == 30) %>% 
  mutate(tow_id = paste0(reitur, "-", tognumer)) %>% 
  select(id = synis_id, t1 = togbyrjun, t2 = togendir, tow_id) %>% 
  collect(n = Inf)
smb <- 
  utbrteg %>% 
  rename(id = synis.id) %>% 
  left_join(st) %>% 
  mutate(ir = geo::d2ir(lat, lon),
         ir_lon = geo::ir2d(ir)$lon,
         ir_lat = geo::ir2d(ir)$lat,
         date = lubridate::ymd(paste0(ar, "-", man, "-", dags))) %>% 
  select(id,
         date,
         year = ar,
         vid = skip,
         tow_id,
         t1,
         t2,
         lon1 = kastad.v.lengd,
         lat1 = kastad.n.breidd,
         lon2 = hift.v.lengd,
         lat2 = hift.n.breidd,
         ir,
         ir_lon,
         ir_lat,
         z1 = dypi.kastad,
         z2 = dypi.hift,
         temp_s = yfirbordshiti,
         temp_b = botnhiti,
         speed = toghradi,
         duration = togtimi,
         towlength = toglengd,
         horizontal = larett.opnun,
         verical = lodrett.opnun,
         wind = vindhradi,
         wind_direction = vindatt,
         bormicon = area,
         oldstrata,
         newstrata,
         cod_kg = torskur.kg,
         cod_n = torskur.stk,
         haddock_kg = ysa.kg,
         haddock_n = ysa.stk,
         saithe_kg = ufsi.kg,
         saithe_n = ufsi.stk,
         wolffish_kg = steinbitur.kg,
         wolffish_n = steinbitur.stk,
         plaice_kg = skarkoli.kg,
         plaice_n = skarkoli.stk,
         monkfish_kg = skotuselur.kg,
         monkfish_n = skotuselur.stk)
smb %>% 
  write_csv("/net/www/export/home/ftp/pub/data/csv/is_smb.csv")

SMB 2019 AIS/VMS tracks

library(mar)
library(lubridate)
con <- connect_mar()
track <- 
  stk_trail(con) %>% 
  filter(time >= to_date("2019-02-26", "YYYY.MM.DD"),
         time <= to_date("2019-03-22", "YYYY.MM.DD")) %>% 
  collect(n = Inf) %>% 
  filter(mid %in% c(101109, 101143, 101070, 102571)) %>% 
  left_join(tibble(mid = c(101109, 101143, 101070, 102571),
                   vid = c(2350, 1131, 1277, 1281),
                   vessel = c("Árni", "Bjarni", "Ljósafell", "Múlaberg"),
                   dep = c(ymd_hms("2019-02-26 01:00:00"),
                           ymd_hms("2019-02-26 01:00:00"),
                           ymd_hms("2019-02-27 01:00:00"),
                           ymd_hms("2019-03-01 01:00:00")),
                   arr =  c(ymd_hms("2019-03-22 23:00:00"),
                            ymd_hms("2019-03-22 23:00:00"),
                            ymd_hms("2019-03-16 23:00:00"),
                            ymd_hms("2019-03-19 23:00:00")))) %>% 
  filter(time >= dep,
         time <= arr) %>% 
  arrange(vessel, time) %>% 
  group_by(vessel) %>% 
  mutate(dist = geo::arcdist(lead(lat), lead(lon), lat, lon),   # distance to next point
         time2 = as.numeric(lead(time) - time) / (60 * 60), # duration to next point
         speed2 = dist/time2) %>%                           # speed    on next "leg"
  filter(speed2 <= 20 | is.na(speed2)) %>% 
  select(vid, vessel, time, lon, lat, speed, heading)

track %>% 
  write_csv("/net/www/export/home/ftp/pub/data/csv/is_smb_vms2019.csv")

Split smb into 2 tidy tables:

smb %>% 
  select(id:newstrata) %>% 
  write_csv("/net/www/export/home/ftp/pub/data/csv/is_smb_stations.csv")
smb %>% 
  select(id, cod_kg:monkfish_n) %>% 
    # step 1
  pivot_longer(-id) %>%
  # step 2
  separate(name, sep = "_", into = c("species", "variable")) %>% 
  # step 3
  pivot_wider(names_from = variable) %>% 
  filter(n > 0) %>% 
  write_csv("/net/www/export/home/ftp/pub/data/csv/is_smb_biological.csv")