This vignette represents an introduction to using the surveyjoin package. We’ll load the package, along with dplyr and ggplot2 for data manipulation and plotting.

library(surveyjoin)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.3.2

The surveyjoin package contains several processing functions that will download the survey data files locally, and create a connection to the SQLite database. The cache_data() function pulls data from a public Github repository, and needs to be only run every time the raw data is refreshed (a new year of survey data becomes available).

If you try to run these functions too many times in an hour, the Github API will timeout, and the load_sql_data() function will work with the files that are already locally cached. You can check the versions (dates) on individual files with

Several helper functions have been included to provide more information on the metadata, survey shapefiles, and links to the original data. Each of these returns a dataframe with a URL for each of our three regions (“afsc”, “pbs”, “nwfsc”).

Additionally, we can look at the table of species names (common, scientific, itis) that are included.

The core function of the package is get_data. This allows for querying by species (using common names - “common”, scientific names - “scientific”, or ITIS identifiers - “itis_id”), regions, surveys, or years – each of these may be a single value or vector of values. We could get data for sablefish from all Alaska surveys in the the last decade with

d <- get_data(common = "sablefish", years = 2013:2023, regions = "afsc")

As a second example, we could get data coastwide for Arrowtooth flounder and plot it with the following code. We’ll constrain it to years 2003-2018 for plotting purposes (though the earliest year of data is 1980).

d <- get_data(common = "arrowtooth flounder", years = 2003:2018)

g <- d |>
  ggplot(aes(
    lon_start,
    lat_start,
    colour = catch_weight / effort,
    size = catch_weight / effort
  )) +
  geom_point(pch = 21) +
  facet_wrap(~year) +
  scale_colour_viridis_c(trans = "log10") +
  theme_light() +
  coord_fixed() +
  xlab("Start longitude") +
  ylab("Start latitude")

ggsave(g,
  filename = "map-example.png",
  width = 6, height = 7, dpi = 150
)

Additional information on specific species can be found in our species dictionary

data("spp_dictionary")

And the full list of survey names can be found with

get_survey_names()
#>                 survey region
#> 1     Aleutian Islands   afsc
#> 2       Gulf of Alaska   afsc
#> 3   eastern Bering Sea   afsc
#> 4  northern Bering Sea   afsc
#> 5     Bering Sea Slope   afsc
#> 6          NWFSC.Combo  nwfsc
#> 7          NWFSC.Shelf  nwfsc
#> 8        NWFSC.Hypoxia  nwfsc
#> 9        NWFSC.Hypoxia  nwfsc
#> 10           Triennial  nwfsc
#> 11             SYN QCS    pbs
#> 12              SYN HS    pbs
#> 13            SYN WCVI    pbs
#> 14            SYN WCHG    pbs