MFDB is a R package enables automated processing of fisheries data into suitable forms for running ecosystem models against it such as Gadget
MFDB contains several distinct sets of functions:
Using this, you can install PostgreSQL locally and have a script automating the process of:
Also, MFDB can be used to connect to a remote database and generate model files from that data.
Before doing anything with mfdb, it is worth knowing a bit about how data is stored. Broadly, there are 2 basic types of table in mfdb, taxonomy and measurement tables.
The measurement tables store all forms of sample data supported, at the finest available detail. These are then aggregated when using any of the mfdb query functions. All measurement data is separated by case study, so multiple case studies can be loaded into a database without conflicts.
Taxonomy tables store all possible values for terms and their meaning, to ensure consistency in the data. For example, species stores short-names and full latin names of all known species to MFDB, to ensure consistency in naming.
Most Taxonomies have defaults which are populated when the database is created, and their definitions are stored as data attached to this package. See mfdb-data
for more information on these. Others, such as areacell and sampling_type are case study specific, and you will need to define your terms before you can import data.
Follow these instructions
Unless you are working with a remote database, you will need to populate the database at least once before you are able to do any querying. The steps your script needs to do are:
Use the mfdb()
function. This will create tables / populate taxonomies if necessary.
mfdb models space in the following way:
Finally, when querying, divisions are grouped together into named collections, for instance mfdb_group(north = 1:3, south = 4:6)
will put anything in divisions 1–3 under an area named “north”, 4–5 under an area named “south”.
Before you can upload any measurements, you have to define the areacells that they will use. You do this using the mfdb_import_area()
function. This allows you to import tables of area/division information, such as:
mfdb_import_area(mdb, data.frame( area = c('101', '102', '103', '401','402', '403'), division = c('1', '1', '1', '4', '4', '4'), ))
If you want areas to be part of multiple divisions, then you can use mfdb_import_division()
to import extra revisions.
Any survey data can have a sampling type defined, which then can be used when querying data. If you want to use a sampling type, then define it using mfdb_import_sampling_type().
At this point, you can start uploading actual measurements. The easiest of which is temperature. Upload a table of areacell/month/temperature data using mfdb_import_temperature()
.
Finally, import any survey data using mfdb_import_survey()
. Ideally upload your data in separate chunks. For example, if you have length and age-length data, don’t combine them in R, upload them separately and both will be used when querying for length data. This keeps the process simple, and allows you to swap out data as necessary.
Stomach surveys are imported in much the same way, however there are 2 data.frames, one representing predators, one preys. The column stomach_name links the two, which can contain any numeric / character value, as long as it is unique for predators and prey measurements are assigned to the correct stomach.
See mfdb_import_survey
for more information or the demo directory for concrete examples.
You can also dump/import a dump from another host using the postgres pg_dump
and pg_restore
commands. You can dump/restore indvidual schemas (i.e. the case study you give to the mfdb()
command), to list all the schemas installed run SELECT DISTINCT(table_schema) FROM information_schema.tables
from psql. Note that if you use mfdb('Baltic')
, the Postgres schema name will be lower-cased.
Create a dump of your chosen schema with the following command:
pg_dump --schema=baltic -F tar mf > baltic.tar
This will make a dump of the “baltic” case study into “baltic.tar”. It can then be restored onto another computer with the following:
pg_restore --clean -d mf baltic.tar
If you already have a baltic schema you wish to preserve, you can rename it first by issuing ALTER SCHEMA baltic RENAME TO baltic_o in psql. Once the restore is done you can rename the new schema and put the name of the old schema back.
There are a selection of querying functions available, all of which work same way. You give a set of parameters, each of which can be a vector of data you wish returned, for instance year = 1998:2000
or species = c('COD')
.
If also grouping by this column (i.e. ‘year’, ‘timestep’, ‘area’ and any other columns given, e.g. ‘age’), then the parameter will control how this grouping works, e.g. maturity_stage = mfdb_group(imm = 1, mat = 2:5) will result in the maturity_stage column having either ‘imm’ or ‘mat’. These will also be used to generate Gadget aggregation files later.
For example, the following queries the temperature table:
defaults <- list(
area = mfdb_group("101" = ),
timestep = mfdb_timestep_quarterly, # Group months to create 2 timesteps for each year
year = 1996:2005)
agg_data <- mfdb_temperature(mdb, defaults)
All functions will result in a list of data.frame result tables (generally only one, unless you requested bootstrapping). Each are suitable for feeding into a gadget function to output into model files.
See mfdb_sample_count
for more information or the demo directory for concrete examples.
Finally, there are a set of functions that turn the output of queries into GADGET model files. These work on a gadget_directory object, which can either be an existing Gadget model to alter, or an empty / nonexistant directory.