Skip to contents

Performs PCA to summarise environmental characteristics of biology sampling sites.

Usage

plot_sitepca(data, vars, eigenvectors = FALSE, label_by, colour_by, plotly = FALSE, save = FALSE, save_dir = getwd(), ...)

Arguments

data

A data frame of site-level environmental characteristics such as that produced by import_env.

vars

A list of at least three continuous environmental variables from 'data'.

eigenvectors

Logical option to add eigenvectors to PCA plot. Default = FALSE.

label_by

Optional variable to label points (e.g. by site ID). Default = NULL.

colour_by

Optional variable to colour points (e.g. by catchment). Default = NULL.

plotly

Logical value specifying whether or not to render the plot as an interactive plotly plot. Default = FALSE.

save

Logical value specifying whether or not output plot should be saved as a png file (called PCA_plot.png). Default = FALSE.

save_dir

Path to folder where plot should be saved. Default = Current working directory.

...

Provision to include additional ggplot plotting and saving arguments, including for example: theme, file type, width and size. See ?theme and ?ggsave for details.

Value

Depending on the 'plotly' argument, either a ggplot or plotly plot displaying the sites in 2D space to show site similarity and identify potential outliers.

Details

The environmental variables listed in 'vars' must be numeric, and complete; sites with missing data will be excluded from the analysis. All variables are automatically centered to zero and scaled to have unit variance prior to analysis (see ?stats::prcomp for further details).

The plot_sitepca function performs a PCA using stats::prcomp() and plots the z-scores of the first two principal components. Using label_by exchanges a point for a defining variable, such as site ID, which can help in identifying outliers. Setting eigenvectors = TRUE will add these as arrows to the plot to indicate the direction and strength of correlation between the environmental variables and the principal components. Longer arrows indicate stronger correlations. These eigenvectors allow the axes to be interpreted as environmental gradients.

Examples

# df = read.csv("INV_OPEN_DATA_SITE.csv")

# Produce and save PCA plot
# plot_sitepca(data = env_data, vars = c("ALTITUDE", "SLOPE", "DIST_FROM_SOURCE", "WIDTH", "DEPTH"), save = TRUE)

# PCA plot with eigenvectors
# plot_sitepca(data = env_data, vars = c("ALTITUDE", "SLOPE", "DIST_FROM_SOURCE", "WIDTH", "DEPTH"), eigenvectors = TRUE)

# PCA plot with points labeled
# plot_sitepca(data = env_data, vars = c("ALTITUDE", "SLOPE", "DIST_FROM_SOURCE", "WIDTH", "DEPTH"), label_by = "biol_site_id")