library(tidyverse)
library(readxl)
library(lubridate)
library(stringr)
library(knitr)

Preamble

The main objective of this exercise is to demonstrate how total catch and total effort can be estimated in R given sample based fishery-surveys data.

The principal equation is:

Catch = CPUE x Effort

In a sample based fishery effort is normally derived from:

Effort = (Boat Activity) x (Total boats) x (Active days)

or

Effort = BAC x F x A

Overall cpue is derived from:

Hence the generic formula to estimate catch is:

Catch = CPUE x Effort

or expressed fully:

Catch = CPUE x BAC x F x A

These variables are first estimated by each strata and then aggregated as needed.

The data

The following case example is based on data used to illustrate the operation of the ARTFISH software. The data has been stored an Excel workbook called Artfish_tidy.xlsx.

The basic strata system in this case example is:

When calculating catch and effort one normally does this first on the strata basis. Given that we only have one month of data we will estimate 1 month x 2 areas x 2 gear categories of catches and effort (I.e. a total of 4 sets of estimates). These statistics form the basis upon which statistics on a more aggregated level are then calculated.

Downloading and importing the data

If you have a “data-raw” directory in you current R working directory one can download the workbook by:

download.file(url = "http://www.hafro.is/~einarhj/crfmr/data-raw/artfish_tidy.xlsx",
              destfile = "data-raw/artfish_tidy.xlsx",
              mode = "wb")

Lets read in all the needed tables (what each contains is explained further below):

f <- "data-raw/artfish_tidy.xlsx"
site <- read_excel(f, "site")
active.days <- read_excel(f, "active_days")
frame.survey <- read_excel(f, "frame_survey")
boat.activity <-
  read_excel(f, "boat_activity") %>% 
  mutate(date = ymd(date))
trip <-
  read_excel(f, "trip") %>% 
  mutate(date = ymd(date))
catch <- 
  read_excel(f, "catch")

Calculations

Boat activity

The boat activity is stored in the boat.activity table. It has five columns:

  • site: The name of the landing site
  • gear: The name of the gear
  • date: Date of sampling
  • n.active: Number of boats that were active
  • n.sampled: Number of boats sampled

The strata area are is not stored in boat activity table but can be obtained from the site dataframe and merged to the table via left_join-function. Once done we can set the strata-grouping using the group_by-function and calculate various statistics. The statistics of primary interest is the boat activity coefficient (bac), which basically is the probability that a boat within a strata has gone fishing. The simplest way is to calculate the mean from the raw observations. And since we have the mean we can also calculate various additional statistic that represent the variability in the data.

bac <-
  boat.activity %>% 
  left_join(site) %>% 
  mutate(month = month(date)) %>% 
  group_by(month, area, gear) %>% 
  summarise(n_sites = n_distinct(site),
            n_days = n_distinct(date),
            active_s = sum(n.active),
            sampled_s = sum(n.sampled),
            n = n(),
            bac_m = mean(n.active/n.sampled),
            se = sd(n.active/n.sampled)/sqrt(n),
            cv = se/bac_m) %>% 
  ungroup() %>% 
  mutate(bac_lower = bac_m - qt(.975, df = n - 1) * se,
         bac_upper = bac_m + qt(.975, df = n - 1) * se) %>% 
  ungroup() 
bac %>% 
  knitr::kable(digits = 3)
month area gear n_sites n_days active_s sampled_s n bac_m se cv bac_lower bac_upper
1 nw hook and line 2 8 55 160 16 0.344 0.040 0.116 0.259 0.428
1 nw trap 2 8 84 160 16 0.525 0.034 0.064 0.454 0.596
1 se hook and line 3 8 108 240 24 0.450 0.036 0.079 0.376 0.524
1 se trap 3 8 155 240 24 0.646 0.031 0.048 0.582 0.709

Estimation of total effort

Besides BAC we need to know the total number of boats within each strata, such data normally coming from a frame survey. The frame survey data has four columns:

  • site: The name of the landing site
  • area: The name of the strata area for the site
  • gear: The name of the gear
  • n.boats: The number of boats

We obtain the sum of the boats for the strata by using the frame survey and obtaining the area strata from the site table:

fs <- 
  frame.survey %>% 
  left_join(site) %>% 
  group_by(area, gear) %>% 
  summarise(n.boats = sum(n.boats))

The number of active days by the strata is stored in the active.days-table that contains the following columns:

  • area: The area strata
  • gear: The gear name
  • year: The year
  • month: The month
  • days: The number of expected days fishing (trap fishery not taking place on Sundays)

We simply merge the three tables containing the bac, total number of boats and active days in the month, using the left_join-function and once done we can calculate the total effort given the equation above:

effort <-
  bac %>%
  select(month, area, gear, bac = bac_m) %>% 
  left_join(fs) %>% 
  left_join(active.days %>% select(area, gear, days)) %>% 
  mutate(effort = bac * n.boats * days)
kable(effort, digits = 2)
month area gear bac n.boats days effort
1 nw hook and line 0.34 750 31 7992.19
1 nw trap 0.52 300 27 4252.50
1 se hook and line 0.45 349 31 4868.55
1 se trap 0.65 523 27 9119.81

Calculation of CPUE

In order to calculate the catch per unit effort we need merge information from the trip and the catch tables. The trip table has the following columns:

  • tid: A unique trip identifier
  • date: The date the trip was n.sampled
  • site: The landing site name
  • gear: The name of the gear used
  • duration: The number of days fishing

The catch table has the following columns:

  • tid: A trip identifier (links to the trip table tid)
  • species: The name of the species n.sampled
  • wt: The weight of the sample (here in kg)
  • no: The number of fish n.sampled
  • price: The selling price

Here we calculate the total cpue per trip, ignoring the species. We hence need to sum the catch of all the species in each trip before we join the data with the trip information. Once done we calculate the cpue for each trip, join the data to the site table (to get the strata area) and then calculate the summary statistic for each strata:

cpue <-
  catch %>% 
  group_by(tid) %>%
  summarise(catch = sum(wt)) %>% 
  ungroup() %>% 
  left_join(trip) %>% 
  mutate(month = month(date),
         cpue = catch/duration) %>% 
  left_join(site) %>% 
  group_by(month, area, gear) %>% 
  summarise(n = n(),
            catch = sum(catch),
            cpue_m = mean(cpue),
            se = sd(cpue)/sqrt(n),
            cv = se/cpue_m) %>% 
  ungroup() %>% 
  mutate(cpue_lower = cpue_m - qt(.975, df = n - 1) * se,
         cpue_upper = cpue_m + qt(.975, df = n - 1) * se)
cpue %>% kable(digits = 3)
month area gear n catch cpue_m se cv cpue_lower cpue_upper
1 nw hook and line 32 839 26.219 1.649 0.063 22.856 29.582
1 nw trap 32 2611 47.359 2.147 0.045 42.980 51.739
1 se hook and line 48 1086 22.625 1.221 0.054 20.169 25.081
1 se trap 48 4325 55.198 1.753 0.032 51.672 58.724

Calculation of landings by strata

We have now have the catch and effort by strata in two tables (cpue and effort) which now can be merged and then landings by strata are simple to calculate:

landings <-
  cpue %>% 
  select(month, area, gear, cpue = cpue_m) %>% 
  left_join(effort %>% select(month, area, gear, effort)) %>% 
  mutate(landings = cpue * effort)
kable(landings, digits = 2)
month area gear cpue effort landings
1 nw hook and line 26.22 7992.19 209545.2
1 nw trap 47.36 4252.50 201395.7
1 se hook and line 22.62 4868.55 110150.9
1 se trap 55.20 9119.81 503394.7

Recapitulation

The bare minimum coding is given below and is less than 40 lines. If one had more months of data, more gears and more landings site one could still use the same code. Actually with only a minor changes in the code one could calculate effort and landing statistics for multiple years.

bac <-
  boat.activity %>% 
  left_join(site) %>% 
  mutate(month = month(date)) %>% 
  group_by(month, area, gear) %>% 
  summarise(bac = mean(n.active/n.sampled)) %>% 
  ungroup()
fs <- 
  frame.survey %>% 
  left_join(site) %>% 
  group_by(area, gear) %>% 
  summarise(n.boats = sum(n.boats)) %>% 
  ungroup()
effort <-
  bac %>%
  left_join(fs) %>% 
  left_join(active.days %>% select(area, gear, days)) %>% 
  mutate(effort = bac * n.boats * days)
cpue <-
  catch %>% 
  group_by(tid) %>%
  summarise(catch = sum(wt)) %>% 
  ungroup() %>% 
  left_join(trip) %>% 
  mutate(month = month(date),
         cpue = catch/duration) %>% 
  left_join(site) %>% 
  group_by(month, area, gear) %>% 
  summarise(cpue = mean(cpue)) %>% 
  ungroup()
landings <-
  cpue %>% 
  left_join(effort) %>% 
  mutate(landings = cpue * effort)

Aggregated statistics

Calculation of more aggregated statistics can now be done based on the above calculation.

By month and area

landings %>% 
  group_by(month, area) %>% 
  summarise(effort = sum(effort),
            landings = sum(landings)) %>% 
  kable(digits = 0)
month area effort landings
1 nw 12245 410941
1 se 13988 613546

By month and gear

landings %>% 
  group_by(month, gear) %>% 
  summarise(effort = sum(effort),
            landings = sum(landings)) %>% 
  kable(digits = 0)
month gear effort landings
1 hook and line 12861 319696
1 trap 13372 704790

By month

landings %>% 
  group_by(month) %>% 
  summarise(effort = sum(effort),
            landings = sum(landings)) %>% 
  kable(digits = 0)
month effort landings
1 26233 1024487

By year

Well if we had more months we would be doing:

landings %>% 
  summarise(effort = sum(effort),
            landings = sum(landings)) %>% 
  kable(digits = 0)
effort landings
26233 1024487

On accuracy level

… need to add text

ba <-
  boat.activity %>% 
  left_join(site) %>% 
  group_by(area, gear) %>% 
  summarise(n.bac = sum(n.sampled))
tr <-
  trip %>%
  left_join(site) %>% 
  group_by(area, gear) %>% 
  summarise(n.cat = n())
frame.survey %>% 
  left_join(site) %>% 
  group_by(area, gear) %>%
  summarise(boats = sum(n.boats)) %>% 
  left_join(active.days %>% select(area, gear, days)) %>% 
  mutate(N = boats * days) %>% 
  left_join(ba) %>% 
  left_join(tr) %>% 
  mutate(bac_accuracy_level = round((1 - 1.96 * 0.5/sqrt(n.bac) * sqrt(1 - n.bac/N)) * 100, 3),
         cat_accuracy_level = round((1 - 1.96 * sqrt((2 * N - 1)/(6 * (N - 1)) - 1/4) / sqrt(n.cat) * sqrt(1 - n.cat/N)) * 100, 3)) %>% 
  kable()
area gear boats days N n.bac n.cat bac_accuracy_level cat_accuracy_level
nw hook and line 750 31 23250 160 32 92.279 90.004
nw trap 300 27 8100 160 32 92.329 90.016
se hook and line 349 31 10819 240 48 93.745 91.851
se trap 523 27 14121 240 48 93.728 91.847