Skip to contents

The import_inv function imports macroinvertebrate sampling data from the Environment Agency's Ecology and Fish Data Explorer (EDE). The data can either be downloaded automatically in .parquet or .csv format, or read in from a previously saved .csv or .rds file. The data can be optionally filtered by site ID and sample date, and the filtered data saved as a .rds file.

Usage

import_inv(source = "parquet", sites = NULL, start_date = NULL, end_date = NULL, save = FALSE, save_dwnld = FALSE, save_dir = getwd(), biol_dir = NULL)

Arguments

source

Specify source of macroinvertebrate data: "parquet" or "csv" to automatically download data from EDE, or provide path to local .csv, .rds or .parquet file. (Alternatively set source = NULL and instead use deprecated biol_dir argument to provide path to local file). Default = "parquet".

sites

Vector of site ids to filter by.

start_date

Required start date (in yyyy-mm-dd format); older records are filtered out. Default = NULL to keep all available data.

end_date

Required end date (in yyyy-mm-dd format); more recent records are filtered out. Default = NULL to keep all available data.

save

Specifies whether (TRUE) or not (FALSE) the filtered data should be saved as an rds file (for future use, or audit trail). Default = FALSE.

save_dwnld

Specifies whether (TRUE) or not (FALSE) the unfiltered parquet or csv file download should be saved, in .rds format. Default = FALSE.

save_dir

Path to folder where downloaded and/or filtered data are to be saved. Default = Current working directory.

biol_dir

Deprecated. Path to local .csv, .rds or parquet file containing macroinvertebrate data. Default = NULL (download data from EDE).

Value

Tibble containing imported macroinvertebrate data.

Details

If automatically downloading data from EDE, the parquet file format is faster to download than csv, and has data types pre-formatted.

If saving a copy of the downloaded data, the name of the rds file is hard-wired to INV_OPEN_DATA_METRICS_ALL.RDS. If saving after filtering on site and/or date, the name of the rds file is hard-wired to INV_OPEN_DATA_METRICS_F.RDS.

Downloaded raw data files (in .parquet and .csv format) will be automatically removed from the working directory following completed execution of the function.

The function automatically modifies the output from EDE, renaming "SITE_ID" to "biol_site_id" (hetoolkit's standardised column header for biology site ids).

Examples


# Bulk download of EDE data for all sites in parquet format and save as .rds file for future use:
# import_inv(save_dwnld = TRUE, save_dir = getwd())

# Bulk download of EDE data for all sites in parquet format:
# import_inv(source = "csv")

# Read in local .rds file and filter on selected sites and dates (up to the present day):
# import_inv(source = "data/INV_OPEN_DATA_METRICS_ALL.rds",
#                sites = c("34310", "34343"),
#                start_date = "1995-01-01",
#                end_date = Sys.Date())

# Read in local .csv file, filter on selected sites, and save the result as a .rds file:
# import_inv(source = "data/INV_OPEN_DATA_METRICS.csv",
#          sites = c("34310", "34343"),
#          save = TRUE,
#          save_dir = getwd())