Link biology data to six-monthly flow statistics for paired biology and flow sites.
join_he_old.Rd
The join_he function links biology data to six-monthly flow statistics (from calc_flowstats
) to create a dataset for hydro-ecological modelling. Includes option to lag selected flow statistics.
Usage
join_he_old(biol_data = biol_data, flow_stats = flow_stats, mapping = NULL, LS1 = FALSE, LS2 = FALSE, lag_vars = c("Q10z", "Q95z"))
Arguments
- biol_data
Data frame or tibble containing the processed biology data. Must contain the following columns: biol_site_id, Year and Season. Seasons must be named "Spring" and "Autumn". See Details below.
- flow_stats
Data frame (first element of list returned by the
calc_flowstats
function), containing the processed flow statistics. Must contain the following columns: flow_site_id, water_year and season.- mapping
Data frame or tibble containing paired biology and flow site IDs. Must contain columns named biol_site_id and flow_site_id. These columns must not contain any NAs. Default = NULL, used when paired biology and flow sites are assumed to have identical ids, and so mapping not required.
- LS1
Logical value indicating whether or not to also link biology samples to flow statistics for the summer period of the previous year. Default = FALSE.
- LS2
Logical value indicating whether or not to also link biology samples to flow statistics for the summer period of the year before last. Default = FALSE.
- lag_vars
List of flow variables from 'flow_stats' to be lagged if LS1 and/or LS2 = TRUE. Default = two commonly-used flow statistics: Q10z and Q95z.
Value
join_he returns a tibble containing processed biology data linked to processed flow statistics.
Details
join_he is not intended to join biology data to residual flow ratio statistics (from calc_rfrstats
) because these are on an annual time step, rather than six-monthly, and therefore easy to join manually.
'biol_data' and 'flow_stats' may contain more sites than listed in 'mapping', but any sites not listed in 'mapping' will be filtered out.
'biol_data' must contain the following columns: biol_site_id, Year and Season. Seasons must be named "Spring" and "Autumn".
join_he joins spring (March to May) biology samples to flow statistics for the preceding winter (October to March) period, and autumn (September to November) samples to flow statistics for the preceding summer (April to September) period. Any winter (December to February) or summer (June to August) biology samples are dropped. All flow statistics in 'flow_stats' are joined; any superfluous variables must be dropped manually, either before or after executing the join_he function.
If LS1 = TRUE, spring biology samples are also joined to flow statistics for the summer period of the previous year. These flow statistics are renamed with the suffix 'LS1'. Only the flow statistics listed in 'lag_vars' are joined in this way.
If LS2 = TRUE, spring biology samples are also joined to flow statistics for the summer period of the year before last. These flow statistics are renamed with the suffix 'LS2'. Only the flow statistics listed in 'lag_vars' are joined in this way.
To facilitate subsequent data visualisation and modelling, the function uses expand.grid to generate all combinations of biol_site_id, season (spring, autumn) and year (ranging from the earliest to the latest year in biol_data). Left joins are used to link the biology data and flow statistics to this expanded grid.
It is recommended that any replicate or duplicate biology samples collected from a site in the same season and year are averaged out or eliminated before executing this function.
Examples
## Join processed biology data and processed flow statistics - no mapping specified because paired flow and biology sites have identical ids. Do not lag flow statistics.
# join_he(biol_data = biol_all,
# flow_stats = flowstats_1,
# mapping = NULL,
# LS1 = FALSE,
# LS2 = FALSE)
## Join processed biology data and processed flow statistics using mapping specified. Link biology data to summer flows in previous year - flow variables to lag specified.
# join_he(biol_data = biol_all,
# flow_stats = flowstats_1,
# mapping = SiteList,
# LS1 = TRUE,
# LS2 = FALSE,
# lag_vars = c("Q10z", "Q95z", "Q70z"))
## Join processed biology and processed flow statistics
### Includes an example of user-create mapping data frame.
# flow_data <- import_hde(sites = c("AA", "BB"))
# flow_stats <- calc_flowstats(flow_data)
# biol_data <- import_biology(sites = c("XX", "YY"))
# mapping <- data.frame(biol_site_id = c("XX", "YY"), flow_site_id = c("BB", "AA"))
# join_he(biol_data = biol_data,
# flow_stats = flow_stats,
# mapping = mapping)