Importing flow data from local files — import

This function imports flow data from one or more local files, one per site, named in the format 'siteID.filextension'. Supported file formats are: csv, txt, all, xls and xlsx.

Usage

import_flowfiles(sites = NULL, dir, skip_num, col_order, date_format = "dmy", start_date = "1985-01-01", end_date = Sys.Date())

Arguments

sites: Vector of site IDs (= file names, without file extension). Default = NULL (import all files (of supported formats) in dir).
dir: Path to folder containing flow files.
skip_num: Number of rows (including any column headers) to skip before starting to read data. As an example, if the data has three rows of metadata, then a row of column headers, set skip_num = 4.
col_order: Position of columns containing the date, flow and quality data. If no quality data, the third element can be set to NA.
date_format: The order of the year (y), month (m), day (d), hour (h), minute (m) and second (s) elements in the date column. Default = "dmy". See Details for more information.
start_date: Start date for flow data extraction (YYYY-MM-DD format). Default = 1985-01-01.
end_date: End date for flow data extraction (YYYY-MM-DD format). Default = today's date.

Value

A tibble containing flow data for the specified sites/files, with the following columns: flow-site_id, date, flow and (if available) quality.

Details

All files must be stored in the same directory and must have the same structure (i.e. the same number of header rows, and with the date, flow and quality columns in the same position) and (with the exception of xls and xlsx files - see below) have the same date format. A mix of different file formats is allowed. Only the first worksheet is imported from xlsx files.

If 'sites' is not NULL, the function uses the information in 'sites' to search 'dir' for all possible files (e.g. 0130TH.csv, 0130TH.txt, 0130TH.all, 0130TH.xls and 0130TH.xlsx). If 'sites' is NULL, then all csv, txt, all, xls and xlsx files in 'dir' are imported. All other folders and file types are ignored.

If a site has two files in different formats (e.g. 0130TH.csv and 0130TH.all), then the data will be imported from both files. Be aware that this could result in duplicate records in the output file.

For csv, txt and all files, dates are initially imported in character format, and then converted to dates. The date_format argument is used to specify the order of the year (y), month (m), day (d), hour (h), minute (m) and second (s) elements in the flow file. The import_flowfiles function then uses the equivalent function from the lubridate package to transform the data to Date format (e.g. if date_format = "dmy_hms", then dates will be formatted using lubriate::dmy_hms(). These 'lubridate' functions recognize arbitrary non-digit separators as well as no separator, so as long as the order of the elements is correct, the dates will parse correctly. For example, dates formatted as 01/10/2000 or 1-Oct-20 will both be parsed correcting using date_format = "dmy". Options for 'date_format' are: dmy, dmy_h, dmy_hm, dmy_hms, mdy, mdy_h, mdy_hm, mdy_hms, ymd, ymd_h, ymd_hm, ymd_hms, ydm, ydm_h, ydm_hm and ydm_hms. Specifying the wrong date_format will result in the following error: "All formats failed to parse. No formats found."

Excel (xlsx and xls) files are imported using the read_excel function from the readxl package, which automatically formats dates, so the date_format argument is ignored for Excel files.

After formatting, any time information is removed, leaving only dates.

The function initially imports flow data for all dates in the flow file(s) and then filters out records that are before start_date or after end_date. If the data does not span the entire range of dates provided, additional records are created and, the flow and quality values defined as NA on these dates.

If a site ID is duplicated in the 'sites' argument, that site is only searched for once in the data and a warning message is produced.

Examples

# Import data for selected sites and dates
# import_flowfiles(sites = c("0130TH", "033006"), dir = "data/wiski", col_order = c(1,2,3), skip_num = 21, date_format = "dmy_hms", start_date = "2010-01-01", end_date = "2010-01-05")

# Returns flow = NA if site exists but no data available for the specified date range
# import_flowfiles(sites = c("0130TH", "033006"), dir = "data/wiski", col_order = c(1,2,3), skip_num = 21, date_format = "dmy_hms", start_date = "1900-01-01", end_date = "1900-01-05")

# Error if no files found for the specified sites
# import_flowfiles(sites = c("hello"), dir = "data/wiski", col_order = c(1,2,3), skip_num = 21)