Importing macroinvertebrate sampling data from the EA Ecology and Fish Data Explorer
import_inv.Rd
The import_inv
function imports macroinvertebrate sampling data from the Environment Agency's Ecology and Fish Data Explorer (EDE). The data can either be downloaded automatically in .parquet or .csv format, or read in from a previously saved .csv or .rds file. The data can be optionally filtered by site ID and sample date, and the filtered data saved as a .rds file.
Usage
import_inv(source = "parquet", sites = NULL, start_date = NULL, end_date = NULL, save = FALSE, save_dwnld = FALSE, save_dir = getwd(), biol_dir = NULL)
Arguments
- source
Specify source of macroinvertebrate data: "parquet" or "csv" to automatically download data from EDE, or provide path to local .csv, .rds or .parquet file. (Alternatively set
source = NULL
and instead use deprecatedbiol_dir
argument to provide path to local file). Default = "parquet".- sites
Vector of site ids to filter by.
- start_date
Required start date (in
yyyy-mm-dd
format); older records are filtered out. Default =NULL
to keep all available data.- end_date
Required end date (in
yyyy-mm-dd
format); more recent records are filtered out. Default =NULL
to keep all available data.- save
Specifies whether (
TRUE
) or not (FALSE
) the filtered data should be saved as an rds file (for future use, or audit trail). Default =FALSE
.- save_dwnld
Specifies whether (
TRUE
) or not (FALSE
) the unfiltered parquet or csv file download should be saved, in .rds format. Default =FALSE
.- save_dir
Path to folder where downloaded and/or filtered data are to be saved. Default = Current working directory.
- biol_dir
Deprecated. Path to local .csv, .rds or parquet file containing macroinvertebrate data. Default =
NULL
(download data from EDE).
Details
If automatically downloading data from EDE, the parquet file format is faster to download than csv, and has data types pre-formatted.
If saving a copy of the downloaded data, the name of the rds file is hard-wired to INV_OPEN_DATA_METRICS_ALL.RDS
. If saving after filtering on site and/or date, the name of the rds file is hard-wired to INV_OPEN_DATA_METRICS_F.RDS
.
Downloaded raw data files (in .parquet and .csv format) will be automatically removed from the working directory following completed execution of the function.
The function automatically modifies the output from EDE, renaming "SITE_ID" to "biol_site_id" (hetoolkit
's standardised column header for biology site ids).
Examples
# Bulk download of EDE data for all sites in parquet format and save as .rds file for future use:
# import_inv(save_dwnld = TRUE, save_dir = getwd())
# Bulk download of EDE data for all sites in parquet format:
# import_inv(source = "csv")
# Read in local .rds file and filter on selected sites and dates (up to the present day):
# import_inv(source = "data/INV_OPEN_DATA_METRICS_ALL.rds",
# sites = c("34310", "34343"),
# start_date = "1995-01-01",
# end_date = Sys.Date())
# Read in local .csv file, filter on selected sites, and save the result as a .rds file:
# import_inv(source = "data/INV_OPEN_DATA_METRICS.csv",
# sites = c("34310", "34343"),
# save = TRUE,
# save_dir = getwd())