library(tidyverse)
library(readxl)
library(lubridate)
library(stringr)
library(knitr)
The main objective of this exercise is to demonstrate how total catch and total effort can be estimated in R given sample based fishery-surveys data.
The principal equation is:
Catch = CPUE x Effort
In a sample based fishery effort is normally derived from:
Effort = (Boat Activity) x (Total boats) x (Active days)
or
Effort = BAC x F x A
Overall cpue is derived from:
Hence the generic formula to estimate catch is:
Catch = CPUE x Effort
or expressed fully:
Catch = CPUE x BAC x F x A
These variables are first estimated by each strata and then aggregated as needed.
The following case example is based on data used to illustrate the operation of the ARTFISH software. The data has been stored an Excel workbook called Artfish_tidy.xlsx.
The basic strata system in this case example is:
When calculating catch and effort one normally does this first on the strata basis. Given that we only have one month of data we will estimate 1 month x 2 areas x 2 gear categories of catches and effort (I.e. a total of 4 sets of estimates). These statistics form the basis upon which statistics on a more aggregated level are then calculated.
If you have a “data-raw” directory in you current R working directory one can download the workbook by:
download.file(url = "http://www.hafro.is/~einarhj/crfmr/data-raw/artfish_tidy.xlsx",
destfile = "data-raw/artfish_tidy.xlsx",
mode = "wb")
Lets read in all the needed tables (what each contains is explained further below):
f <- "data-raw/artfish_tidy.xlsx"
site <- read_excel(f, "site")
active.days <- read_excel(f, "active_days")
frame.survey <- read_excel(f, "frame_survey")
boat.activity <-
read_excel(f, "boat_activity") %>%
mutate(date = ymd(date))
trip <-
read_excel(f, "trip") %>%
mutate(date = ymd(date))
catch <-
read_excel(f, "catch")
The boat activity is stored in the boat.activity table. It has five columns:
The strata area are is not stored in boat activity table but can be obtained from the site
dataframe and merged to the table via left_join
-function. Once done we can set the strata-grouping using the group_by
-function and calculate various statistics. The statistics of primary interest is the boat activity coefficient (bac), which basically is the probability that a boat within a strata has gone fishing. The simplest way is to calculate the mean from the raw observations. And since we have the mean we can also calculate various additional statistic that represent the variability in the data.
bac <-
boat.activity %>%
left_join(site) %>%
mutate(month = month(date)) %>%
group_by(month, area, gear) %>%
summarise(n_sites = n_distinct(site),
n_days = n_distinct(date),
active_s = sum(n.active),
sampled_s = sum(n.sampled),
n = n(),
bac_m = mean(n.active/n.sampled),
se = sd(n.active/n.sampled)/sqrt(n),
cv = se/bac_m) %>%
ungroup() %>%
mutate(bac_lower = bac_m - qt(.975, df = n - 1) * se,
bac_upper = bac_m + qt(.975, df = n - 1) * se) %>%
ungroup()
bac %>%
knitr::kable(digits = 3)
month | area | gear | n_sites | n_days | active_s | sampled_s | n | bac_m | se | cv | bac_lower | bac_upper |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | nw | hook and line | 2 | 8 | 55 | 160 | 16 | 0.344 | 0.040 | 0.116 | 0.259 | 0.428 |
1 | nw | trap | 2 | 8 | 84 | 160 | 16 | 0.525 | 0.034 | 0.064 | 0.454 | 0.596 |
1 | se | hook and line | 3 | 8 | 108 | 240 | 24 | 0.450 | 0.036 | 0.079 | 0.376 | 0.524 |
1 | se | trap | 3 | 8 | 155 | 240 | 24 | 0.646 | 0.031 | 0.048 | 0.582 | 0.709 |
Besides BAC we need to know the total number of boats within each strata, such data normally coming from a frame survey. The frame survey data has four columns:
We obtain the sum of the boats for the strata by using the frame survey and obtaining the area strata from the site table:
fs <-
frame.survey %>%
left_join(site) %>%
group_by(area, gear) %>%
summarise(n.boats = sum(n.boats))
The number of active days by the strata is stored in the active.days
-table that contains the following columns:
We simply merge the three tables containing the bac, total number of boats and active days in the month, using the left_join
-function and once done we can calculate the total effort given the equation above:
effort <-
bac %>%
select(month, area, gear, bac = bac_m) %>%
left_join(fs) %>%
left_join(active.days %>% select(area, gear, days)) %>%
mutate(effort = bac * n.boats * days)
kable(effort, digits = 2)
month | area | gear | bac | n.boats | days | effort |
---|---|---|---|---|---|---|
1 | nw | hook and line | 0.34 | 750 | 31 | 7992.19 |
1 | nw | trap | 0.52 | 300 | 27 | 4252.50 |
1 | se | hook and line | 0.45 | 349 | 31 | 4868.55 |
1 | se | trap | 0.65 | 523 | 27 | 9119.81 |
In order to calculate the catch per unit effort we need merge information from the trip and the catch tables. The trip table has the following columns:
The catch table has the following columns:
Here we calculate the total cpue per trip, ignoring the species. We hence need to sum the catch of all the species in each trip before we join the data with the trip information. Once done we calculate the cpue for each trip, join the data to the site table (to get the strata area) and then calculate the summary statistic for each strata:
cpue <-
catch %>%
group_by(tid) %>%
summarise(catch = sum(wt)) %>%
ungroup() %>%
left_join(trip) %>%
mutate(month = month(date),
cpue = catch/duration) %>%
left_join(site) %>%
group_by(month, area, gear) %>%
summarise(n = n(),
catch = sum(catch),
cpue_m = mean(cpue),
se = sd(cpue)/sqrt(n),
cv = se/cpue_m) %>%
ungroup() %>%
mutate(cpue_lower = cpue_m - qt(.975, df = n - 1) * se,
cpue_upper = cpue_m + qt(.975, df = n - 1) * se)
cpue %>% kable(digits = 3)
month | area | gear | n | catch | cpue_m | se | cv | cpue_lower | cpue_upper |
---|---|---|---|---|---|---|---|---|---|
1 | nw | hook and line | 32 | 839 | 26.219 | 1.649 | 0.063 | 22.856 | 29.582 |
1 | nw | trap | 32 | 2611 | 47.359 | 2.147 | 0.045 | 42.980 | 51.739 |
1 | se | hook and line | 48 | 1086 | 22.625 | 1.221 | 0.054 | 20.169 | 25.081 |
1 | se | trap | 48 | 4325 | 55.198 | 1.753 | 0.032 | 51.672 | 58.724 |
We have now have the catch and effort by strata in two tables (cpue and effort) which now can be merged and then landings by strata are simple to calculate:
landings <-
cpue %>%
select(month, area, gear, cpue = cpue_m) %>%
left_join(effort %>% select(month, area, gear, effort)) %>%
mutate(landings = cpue * effort)
kable(landings, digits = 2)
month | area | gear | cpue | effort | landings |
---|---|---|---|---|---|
1 | nw | hook and line | 26.22 | 7992.19 | 209545.2 |
1 | nw | trap | 47.36 | 4252.50 | 201395.7 |
1 | se | hook and line | 22.62 | 4868.55 | 110150.9 |
1 | se | trap | 55.20 | 9119.81 | 503394.7 |
The bare minimum coding is given below and is less than 40 lines. If one had more months of data, more gears and more landings site one could still use the same code. Actually with only a minor changes in the code one could calculate effort and landing statistics for multiple years.
bac <-
boat.activity %>%
left_join(site) %>%
mutate(month = month(date)) %>%
group_by(month, area, gear) %>%
summarise(bac = mean(n.active/n.sampled)) %>%
ungroup()
fs <-
frame.survey %>%
left_join(site) %>%
group_by(area, gear) %>%
summarise(n.boats = sum(n.boats)) %>%
ungroup()
effort <-
bac %>%
left_join(fs) %>%
left_join(active.days %>% select(area, gear, days)) %>%
mutate(effort = bac * n.boats * days)
cpue <-
catch %>%
group_by(tid) %>%
summarise(catch = sum(wt)) %>%
ungroup() %>%
left_join(trip) %>%
mutate(month = month(date),
cpue = catch/duration) %>%
left_join(site) %>%
group_by(month, area, gear) %>%
summarise(cpue = mean(cpue)) %>%
ungroup()
landings <-
cpue %>%
left_join(effort) %>%
mutate(landings = cpue * effort)
Calculation of more aggregated statistics can now be done based on the above calculation.
landings %>%
group_by(month, area) %>%
summarise(effort = sum(effort),
landings = sum(landings)) %>%
kable(digits = 0)
month | area | effort | landings |
---|---|---|---|
1 | nw | 12245 | 410941 |
1 | se | 13988 | 613546 |
landings %>%
group_by(month, gear) %>%
summarise(effort = sum(effort),
landings = sum(landings)) %>%
kable(digits = 0)
month | gear | effort | landings |
---|---|---|---|
1 | hook and line | 12861 | 319696 |
1 | trap | 13372 | 704790 |
landings %>%
group_by(month) %>%
summarise(effort = sum(effort),
landings = sum(landings)) %>%
kable(digits = 0)
month | effort | landings |
---|---|---|
1 | 26233 | 1024487 |
Well if we had more months we would be doing:
landings %>%
summarise(effort = sum(effort),
landings = sum(landings)) %>%
kable(digits = 0)
effort | landings |
---|---|
26233 | 1024487 |
… need to add text
ba <-
boat.activity %>%
left_join(site) %>%
group_by(area, gear) %>%
summarise(n.bac = sum(n.sampled))
tr <-
trip %>%
left_join(site) %>%
group_by(area, gear) %>%
summarise(n.cat = n())
frame.survey %>%
left_join(site) %>%
group_by(area, gear) %>%
summarise(boats = sum(n.boats)) %>%
left_join(active.days %>% select(area, gear, days)) %>%
mutate(N = boats * days) %>%
left_join(ba) %>%
left_join(tr) %>%
mutate(bac_accuracy_level = round((1 - 1.96 * 0.5/sqrt(n.bac) * sqrt(1 - n.bac/N)) * 100, 3),
cat_accuracy_level = round((1 - 1.96 * sqrt((2 * N - 1)/(6 * (N - 1)) - 1/4) / sqrt(n.cat) * sqrt(1 - n.cat/N)) * 100, 3)) %>%
kable()
area | gear | boats | days | N | n.bac | n.cat | bac_accuracy_level | cat_accuracy_level |
---|---|---|---|---|---|---|---|---|
nw | hook and line | 750 | 31 | 23250 | 160 | 32 | 92.279 | 90.004 |
nw | trap | 300 | 27 | 8100 | 160 | 32 | 92.329 | 90.016 |
se | hook and line | 349 | 31 | 10819 | 240 | 48 | 93.745 | 91.851 |
se | trap | 523 | 27 | 14121 | 240 | 48 | 93.728 | 91.847 |