Animal observations from on-site surveys can be used to build predictive models of animal diversity, or summarised to directly compare animal diversity between different areas or time periods. This article outlines the ways to format, check and process the animal data.

Figure: Broad overview of the data workflow for a chosen animal group

First, load the necessary packages to run the analysis:

Data format

Data from animal surveys are organised into two separate tables: (1) a record of animal observations during each survey; and (2) reference information about each survey. The existence of (2) ensures that surveys with zero animal observations are accounted for. More details on how data are collected can be found in vignette("animals-survey-protocols"). These example data can be loaded by running the following code:

Table: Example showing animal observations recorded during surveys. Each row represents a unique observation at a specified time, which includes the species name and count (abundance) of individuals. Refer to help(animal_observations) for more information.
If necessary, the convenience function filter_observations() may be used to filter the animal survey data based on specified grouping variables (e.g. area, period, taxon). For example, observations from bird (Aves) surveys in Tampines (TP) conducted during 2020-2021 (survey period 2) can be filtered as follows:

observations_subset <- filter_observations(observations = animal_observations, 
                                           survey_ref = animal_surveys,
                                           specify_taxon = "Aves",
                                           specify_area = "TP",
                                           specify_period = "2")

Note that the column survey_id will be converted to a factor variable, which includes levels that are not present in the observation data (i.e., no butterflies were observed during those surveys).

Data checks

Checks can be performed to remove group-level observations, to prevent over-counting the number of species when summarising the data afterwards. For instance, if the species name of an observed animal is unknown, surveyors may have recorded such observations at a higher (group) level of classification, such as the family or genus. However, this same species recorded as a group-level entry may also be identified correctly (at the species-level) during other surveys at the same point/area (e.g., by another surveyor). When the number of species is tallied, it may result in a double-count of the particular animal species.

For example, there are two species of swallows observed across all points in the example dataset–Hirundo tahitica and Hirundo rustica. Hirundo spp. is entered (in the species column) when the species cannot be identified with confidence:

observations_swallows <- observations_subset %>% 
  filter(grepl("Hirundo", species)) %>% 
  group_by(area, point_id, species, family, genus) %>% 

This would inflate the tallied number of species per point/area, since Hirundo spp. is a unique entry in the species column. For example, in the example dataset, a total of three swallow species would be reported within the Tampines (TP) area, rather than the two known to be present in the city of Singapore:

observations_swallows %>% 
  group_by(area, species) %>% # tally by area
  summarise() %>%
One way to avoid such double-counting is to remove these group-level entries if all species in that particular classification group are observed, at the specified granularity of interest (point or area). The function check_taxongrps() identifies such group-level entries, as well as the total number of species within that grouping level (by referring to the columns genus and family). For example, for the example dataset of swallows within the Tampines (TP) area, we can check the number of unique species within the Hirundo spp. genus and Hirundinidae family:

to_remove <- check_taxongrps(observations_swallows, level = "area")
These entries can subsequently be removed from observations_swallows, if they have been recorded in the species column:

filtered_observations <- observations_swallows %>%
  anti_join(to_remove, by = c("species" = "name"))

To verify that these entries have been removed, we can re-tally the filtered observations. The number of species will be reduced by one:

filtered_observations %>% 
  group_by(area, species) %>% # tally by area
  summarise() %>%
Summarise data

To build predictive models for local (alpha) diversity for a chosen animal group, the animal observations will need to be aggregated at the level of each sampling point. For example, we can tally the number of bird (animal group ‘Aves’) species per point using the function tally_observations(). Note that this function avoids double-counting group-level records, by acting as a wrapper to the function check_taxongrps() (see previous section).

birds <- 
  tally_observations(observations = animal_observations, 
                     survey_ref = animal_surveys,
                     level = "point",
                     specify_taxon = "Aves")

To build predictive models of community (Beta) diversity for a chosen animal group, a presence/absence community matrix will need to be created. For each species (columns) at each sampling point (rows), presence is denoted as 1 while absence is denoted as 0. For example, we can generate the community matrix for the animal group ‘Aves’ (birds), after manually removing genus/family records using the function check_taxongrps():

# manually exclude genus/family lvl records by point
to_remove <- 
  check_taxongrps(animal_observations, level = "point") 

filtered_observations <- animal_observations %>%
  anti_join(to_remove, by = c("species" = "name",

# generate community matrix
bird_com <- filtered_observations %>% 
  filter(taxon == "Aves") %>% 
  group_by(point_id, species) %>% # tally no. of individuals per point and species 
  summarise(n = sum(abundance)) %>%
  group_by(point_id) %>%
  pivot_wider(names_from = species, # pivot to wide format
              values_from = n) %>%
  replace(,0) %>%
  ungroup() %>% %>% 
  dplyr::select(-point_id) %>%
  mutate(across(.cols = everything(), # change to presence/absence
                ~ case_when(. > 0 ~ 1,
                            . == 0 ~ 0))) %>%
  select(which(colMeans(.) > 0)) # include species observed at least once

