Datasets - overview


The data used as examples in the course are all stored at this location ftp://ftp.hafro.is/pub/data. To obtain the path for reading any file into R it is simplest to:

  • Go to the site using a browser
  • Find the file of interest
  • Right click on the filename and copy linklocation
  • Paste the path into our R-script

The content of the directory is as follows:

fs::dir_tree("/home/ftp/pub/data/csv")
## /home/ftp/pub/data/csv
## ├── burdarthol.csv
## ├── datras_2018_haul.csv
## ├── datras_2018_length.csv
## ├── flotinnosigrandi.csv
## ├── flotinnosigrandi.rds
## ├── grunnpunktar.csv
## ├── ice_rgl_grunnpunktar.csv
## ├── iessns2019_strata.csv
## ├── iessns2019_tows.csv
## ├── iessns2019_trail.csv
## ├── is_smb.csv
## ├── is_smb_biological.csv
## ├── is_smb_cod_rbya.csv
## ├── is_smb_stations.csv
## ├── is_smb_vms2019.csv
## ├── is_survey-stations.csv
## ├── is_survey-tracks.csv
## ├── midlinur_eez.csv
## ├── minke.csv
## ├── small_vms.csv
## ├── smb_by_reit.csv
## └── vidmidunarpunktar.csv
/net/www/export/home/ftp/pub/data [error opening dir]

0 directories, 0 files

Data


Minke

The minke whale dataset contains biological measurements from 192 scientific catches of minke whales between the year 2003 and 2007. The variables are as follows:

  • whale.id: Unique identifier for the whale
  • date.caught: the date when the whales was caught
  • lat: latitude
  • lon: longitude
  • area: Derived from location (North/South)
  • length: length of the whale
  • weight: weight of the whale
  • age: age of the whale
  • sex: Male or Female
  • maturity: maturity status of the whale
  • stomach.volume: volume (in liters) of the stomach content
  • stomach.weight: weight of the stomach content
  • year: the year when the whale was caught

Importing the data into R:

minke <- 
  read.csv("ftp://ftp.hafro.is/pub/data/csv/minke.csv",
           stringsAsFactors = FALSE) %>% 
  as_tibble()

Icelandic bottom trawl spring survey (SMB)

The smb dataset contains 19846 tows from the annual Icelandic bottom trawl spring survey from years 1985 to 2019. It is in a long format, each row representing a single tow. The variables are as follows:

  • id: Unique station identification code
  • date: The date
  • vid: Vessel identification number
  • tow_id: Tow identification number
  • t1: Time of tow start (shoot time)
  • t2: Time of tow end (haul time)
  • lon1: Longitude of tow start, decimal degrees
  • lat1: Latititue of tow start, decimal degrees
  • lon2: Longitude of tow end, decimal degrees
  • lat2: Latitute of tow end, decimal degrees
  • ir: ICES statistical rectangle
  • ir_lon ICES statistical rectangle longitude midpoint
  • ir_lat ICES statistical rectangle latitude midpoint
  • z1: Bottom depth at tow start
  • z2: Bottom depth at tow end
  • speed: Average towing speed in knots
  • duation: Duration of tow in minutes
  • towlength: Length of tow in nautical miles
  • horizontal: Horizontal net opening in meters
  • vertical: Vertical net opening in meters
  • wind: Wind strength, Beufort scale
  • wind_direction: Wind directions in degrees
  • bormicon: Bormicon identification number (see Bormicon shapefile below)
  • oldstrata: Old stratification identification number
  • newstrata: New stratification identification number
  • __**_kg__: Species biomass in kilograms, standardized to 4 nautical miles
  • __**_n__: Species abundance, standardize to 4 nautical miles

Importing the data into R:

d <- read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb.csv")

SMB 2019 vessel tracks

The dataset are 8918 ais/vms records of four vessels that participated in the Icelandic spring survey in 2019. The variables are as follows:

  • vid: Vessel identification number
  • vessel: Vessel name
  • time: Time in UTC
  • lon: Longitude in decimal degrees
  • lat: Latitude in decimal degrees
  • speed: Vessel speed in knots
  • heading: Vessel headings in degrees

Importing the data into R:

d <- 
  read_csv("ftp://ftp.hafro.is/pub/data/csv/is_smb_vms2019.csv")

Shapes


Bormicon

check_last_point <- function(df) {
  
  groups <- df %>% pull(group) %>% unique()
  
  res <- list()
  
  for(i in 1:length(groups)) {
    
    x <- df %>% filter(group == groups[i])
    
    if(x$lat[1] != x$lat[nrow(x)]) {
      res[[i]] <-
        tibble(lat = c(x$lat, x$lat[1]),
               lon = c(x$lon, x$lon[1]),
               group = c(x$group, x$group[1]))
    } else {
      res[[i]] <- x
    }
  }
  bind_rows(res)
}
d <- 
  tibble(nafn = attributes(fjolst::reg.bc)$names,
         area = attributes(fjolst::reg.bc)$area) %>% 
  mutate(nr = 1:n(),
         name = c("W", "NW", "N_center", "N_shallow",
                  "NE", "E", "E_ridge", "SE",
                  "S_SE", "S_SW", "NW_deep", "NE_deep",
                  "S_deep", "W_ridge", "W_deep", "N_deep"))

bormicon <-
  fjolst::reg.bc %>%
  bind_rows(.id = "group") %>%
  check_last_point() %>%        # see function below
  sf::st_as_sf(coords = c("lon","lat")) %>%
  sf::st_set_crs(4326) %>%
  group_by(group) %>%
  summarise(do_union = FALSE) %>%
  sf::st_cast("POLYGON")  %>%  # turn MULTIPOINT TO POLYGON
  rename(nafn = group) %>% 
  left_join(d) %>% 
  arrange(nr)
st_write(bormicon, "/net/www/export/home/ftp/pub/data/shapes/bormicon.gpkg")

ICES shapes

… create an interactive map showing all shapes

ICES shapes code

The following shapes were obtained from gis.ices.dk. The primary purpose for storing them here was:

  • Storing each as a single files in “gpkg”-format rather numerous ESRI-shapefiles that are bundled within each zip-file.
  • Check shape validity.
  • Create consistent variable names.
  • Simplifying some very dense files in order to reduce usage of memory space when read into R.

The code shows how the data were obtained, and in case of simplification how they were simplified.

A little helper function:

read_zipped_shapes <- function(url, simplify = FALSE, make_valid = TRUE) {
  td <- tempdir()
  tf <- tempfile()
  download.file(url, tf)
  fil <- unzip(tf, exdir = td)
  fil <- fil[grep(".shp$", fil)]
  if(length(fil) == 1) {
    sp <- 
      sf::read_sf(fil)
    #if(url %in% c("http://gis.ices.dk/shapefiles/ICES_ecoregions.zip",
    #              "ftp://ftp.hafro.is/pub/data/shapes/ices_areas.gpkg")) {
    #  bb <- sf::st_bbox(sp)
    #  bb[[4]] <- 89
    #  sp <- sf::st_crop(sp, bb)
    #}
  } else {
    res <- map(fil, sf::read_sf)
    names(res) <- 
      basename(fil) %>% 
      stringr::str_remove(".shp")
    sp <- 
      data.table::rbindlist(res,
                            use.names = TRUE,
                            fill = TRUE,
                            idcol = "name") %>%
      sf::st_as_sf()
  }
  
  sp <-
    sp %>% 
    dplyr::rename_all(tolower)
  file.remove(fil)
  
  if(!all(sf::st_is_valid(sp)) & make_valid) {
    sp <- sp %>% lwgeom::st_make_valid()
  }
  
  if(simplify) sp <- rmapshaper::ms_simplify(sp)
  
  if(is.na(sf::st_crs(sp)$epsg)) {
    # assume it is 4326
    sp <- sp %>% sf::st_set_crs(value = 4326)
  } else {
    if(sf::st_crs(sp)$epsg != 4326) sp <- sp %>% sf::st_transform(crs = 4326)
  }
  
  return(sp)
  
}

Nephrops functional units

reading:

url <- "http://gis.ices.dk/shapefiles/Nephrops_FU.zip"
p <- 
  read_zipped_shapes(url) %>%
  select(name = fu_descrip, fu)
write_sf(p, "/net/www/export/home/ftp/pub/data/shapes/nephrops_fu.gpkg")

view:

ICES areas

url <- "http://gis.ices.dk/shapefiles/ICES_areas.zip"
p <-
  read_zipped_shapes(url, simplify = TRUE)
write_sf(p, "/net/www/export/home/ftp/pub/data/shapes/ices_areas.gpkg")

view:

ICES ecoregions

url <- "http://gis.ices.dk/shapefiles/ICES_ecoregions.zip"
p <- 
  read_zipped_shapes(url, simplify = TRUE) %>% 
  select(ecoregion,
         area_km2 = shape_area)
p %>% write_sf("/net/www/export/home/ftp/pub/data/shapes/ices_ecoregions.gpkg")

view:

OSPAR (from ICES)

url <- "http://gis.ices.dk/shapefiles/OSPAR_Subregions.zip"
p <- read_zipped_shapes(url) 
p %>% write_sf("/net/www/export/home/ftp/pub/data/shapes/ospar.gpkg")

view:

HELOM (from ICES)

url <- "http://gis.ices.dk/shapefiles/HELCOM_subbasins.zip"
p <- read_zipped_shapes(url, simplify = TRUE)
p %>% write_sf("/net/www/export/home/ftp/pub/data/shapes/helcom.gpkg")

view:

ICES statistical rectangles

url <- "http://gis.ices.dk/shapefiles/ICES_rectangles.zip" 
p <- read_zipped_shapes(url)
write_sf(p, "/net/www/export/home/ftp/pub/data/shapes/ices_rectangles.gpkg")
rect <- read_sf("ftp://ftp.hafro.is/pub/data/shapes/ices_rectangles.gpkg")

ICES statistical subrectangles

url <- "http://gis.ices.dk/shapefiles/ICES_SubStatrec.zip" 
p <- read_zipped_shapes(url)
p %>% 
  write_sf("/net/www/export/home/ftp/pub/data/shapes/ices_subrectangles.gpkg")

Appendix - data generation

What follows is just bookeeping the source of the data as well as documentation of any data manipulation done.

Icelandic bottom trawl spring survey (SMB)

attach("/u2/reikn/R/SurveyWork/SMB/catchperstation.rdata")
library(mar)
con <- connect_mar()
st <- 
  lesa_stodvar(con) %>% 
  filter(synaflokkur == 30) %>% 
  mutate(tow_id = paste0(reitur, "-", tognumer)) %>% 
  select(id = synis_id, t1 = togbyrjun, t2 = togendir, tow_id) %>% 
  collect(n = Inf)
smb <- 
  utbrteg %>% 
  rename(id = synis.id) %>% 
  left_join(st) %>% 
  mutate(ir = geo::d2ir(lat, lon),
         ir_lon = geo::ir2d(ir)$lon,
         ir_lat = geo::ir2d(ir)$lat,
         date = lubridate::ymd(paste0(ar, "-", man, "-", dags))) %>% 
  select(id,
         date,
         year = ar,
         vid = skip,
         tow_id,
         t1,
         t2,
         lon1 = kastad.v.lengd,
         lat1 = kastad.n.breidd,
         lon2 = hift.v.lengd,
         lat2 = hift.n.breidd,
         ir,
         ir_lon,
         ir_lat,
         z1 = dypi.kastad,
         z2 = dypi.hift,
         temp_s = yfirbordshiti,
         temp_b = botnhiti,
         speed = toghradi,
         duration = togtimi,
         towlength = toglengd,
         horizontal = larett.opnun,
         verical = lodrett.opnun,
         wind = vindhradi,
         wind_direction = vindatt,
         bormicon = area,
         oldstrata,
         newstrata,
         cod_kg = torskur.kg,
         cod_n = torskur.stk,
         haddock_kg = ysa.kg,
         haddock_n = ysa.stk,
         saithe_kg = ufsi.kg,
         saithe_n = ufsi.stk,
         wolffish_kg = steinbitur.kg,
         wolffish_n = steinbitur.stk,
         plaice_kg = skarkoli.kg,
         plaice_n = skarkoli.stk,
         monkfish_kg = skotuselur.kg,
         monkfish_n = skotuselur.stk)
smb %>% 
  write_csv("/net/www/export/home/ftp/pub/data/csv/is_smb.csv")

SMB 2019 AIS/VMS tracks

library(mar)
library(lubridate)
con <- connect_mar()
track <- 
  stk_trail(con) %>% 
  filter(time >= to_date("2019-02-26", "YYYY.MM.DD"),
         time <= to_date("2019-03-22", "YYYY.MM.DD")) %>% 
  collect(n = Inf) %>% 
  filter(mid %in% c(101109, 101143, 101070, 102571)) %>% 
  left_join(tibble(mid = c(101109, 101143, 101070, 102571),
                   vid = c(2350, 1131, 1277, 1281),
                   vessel = c("Árni", "Bjarni", "Ljósafell", "Múlaberg"),
                   dep = c(ymd_hms("2019-02-26 01:00:00"),
                           ymd_hms("2019-02-26 01:00:00"),
                           ymd_hms("2019-02-27 01:00:00"),
                           ymd_hms("2019-03-01 01:00:00")),
                   arr =  c(ymd_hms("2019-03-22 23:00:00"),
                            ymd_hms("2019-03-22 23:00:00"),
                            ymd_hms("2019-03-16 23:00:00"),
                            ymd_hms("2019-03-19 23:00:00")))) %>% 
  filter(time >= dep,
         time <= arr) %>% 
  arrange(vessel, time) %>% 
  group_by(vessel) %>% 
  mutate(dist = geo::arcdist(lead(lat), lead(lon), lat, lon),   # distance to next point
         time2 = as.numeric(lead(time) - time) / (60 * 60), # duration to next point
         speed2 = dist/time2) %>%                           # speed    on next "leg"
  filter(speed2 <= 20 | is.na(speed2)) %>% 
  select(vid, vessel, time, lon, lat, speed, heading)

track %>% 
  write_csv("/net/www/export/home/ftp/pub/data/csv/is_smb_vms2019.csv")

Split smb into 2 tidy tables:

smb %>% 
  select(id:newstrata) %>% 
  write_csv("/net/www/export/home/ftp/pub/data/csv/is_smb_stations.csv")
smb %>% 
  select(id, cod_kg:monkfish_n) %>% 
    # step 1
  pivot_longer(-id) %>%
  # step 2
  separate(name, sep = "_", into = c("species", "variable")) %>% 
  # step 3
  pivot_wider(names_from = variable) %>% 
  filter(n > 0) %>% 
  write_csv("/net/www/export/home/ftp/pub/data/csv/is_smb_biological.csv")