Summarising environmental characteristics of biological sampling sites
plot_sitepca.Rd
Performs PCA to summarise environmental characteristics of biology sampling sites.
Usage
plot_sitepca(data, vars, eigenvectors = FALSE, label_by, colour_by, plotly = FALSE, save = FALSE, save_dir = getwd(), ...)
Arguments
- data
A data frame of site-level environmental characteristics such as that produced by import_env.
- vars
A list of at least three continuous environmental variables from 'data'.
- eigenvectors
Logical option to add eigenvectors to PCA plot. Default = FALSE.
- label_by
Optional variable to label points (e.g. by site ID). Default = NULL.
- colour_by
Optional variable to colour points (e.g. by catchment). Default = NULL.
- plotly
Logical value specifying whether or not to render the plot as an interactive plotly plot. Default = FALSE.
- save
Logical value specifying whether or not output plot should be saved as a png file (called PCA_plot.png). Default = FALSE.
- save_dir
Path to folder where plot should be saved. Default = Current working directory.
- ...
Provision to include additional ggplot plotting and saving arguments, including for example: theme, file type, width and size. See ?theme and ?ggsave for details.
Value
Depending on the 'plotly' argument, either a ggplot or plotly plot displaying the sites in 2D space to show site similarity and identify potential outliers.
Details
The environmental variables listed in 'vars' must be numeric, and complete; sites with missing data will be excluded from the analysis. All variables are automatically centered to zero and scaled to have unit variance prior to analysis (see ?stats::prcomp for further details).
The plot_sitepca function performs a PCA using stats::prcomp() and plots the z-scores of the first two principal components. Using label_by exchanges a point for a defining variable, such as site ID, which can help in identifying outliers. Setting eigenvectors = TRUE will add these as arrows to the plot to indicate the direction and strength of correlation between the environmental variables and the principal components. Longer arrows indicate stronger correlations. These eigenvectors allow the axes to be interpreted as environmental gradients.
Examples
# df = read.csv("INV_OPEN_DATA_SITE.csv")
# Produce and save PCA plot
# plot_sitepca(data = env_data, vars = c("ALTITUDE", "SLOPE", "DIST_FROM_SOURCE", "WIDTH", "DEPTH"), save = TRUE)
# PCA plot with eigenvectors
# plot_sitepca(data = env_data, vars = c("ALTITUDE", "SLOPE", "DIST_FROM_SOURCE", "WIDTH", "DEPTH"), eigenvectors = TRUE)
# PCA plot with points labeled
# plot_sitepca(data = env_data, vars = c("ALTITUDE", "SLOPE", "DIST_FROM_SOURCE", "WIDTH", "DEPTH"), label_by = "biol_site_id")