Lab 7

Authors

Charles Lehnen

Melissa Guzman

Cameron Egan

Biodiversity

There are many ways to measure biodiversity and methods to compare the biodiversity between populations or communities. The simplest two are often considered to be abundance and richness.

Abundance

Richness is the number of species in a community while abundance is the number of individuals from each species in that community.

Let’s load our ratdat dataset

library(ratdat)

head(ratdat::complete)
# A tibble: 6 × 13
  record_id month   day  year plot_id species_id sex   hindfoot_length weight
      <int> <int> <int> <int>   <int> <chr>      <chr>           <int>  <int>
1         1     7    16  1977       2 NL         M                  32     NA
2         2     7    16  1977       3 NL         M                  33     NA
3         3     7    16  1977       2 DM         F                  37     NA
4         4     7    16  1977       7 DM         M                  36     NA
5         5     7    16  1977       3 DM         M                  35     NA
6         6     7    16  1977       1 PF         M                  14     NA
# ℹ 4 more variables: genus <chr>, species <chr>, taxa <chr>, plot_type <chr>

How can we find abundance within this community?

  • We need to count the number of individuals detected from each species.

Now let’s wrangle our ratdat::complete dataset using dplyr to see the abundances of each species in our community.

df <- ratdat::complete %>%
  
  # Filter out rows with blank cells for species_id
  filter(species_id != "") %>%
  
  # Subset dataframe by species_id
  group_by(species_id) %>%
  
  # Count number of rows in each subset, create a new column `abundance`
  summarise(abundance = n()) %>%
  
  # Put subsets back together
  ungroup() %>%
  
  # Sort by abundance
  arrange(abundance)

df
# A tibble: 48 × 2
   species_id abundance
   <chr>          <int>
 1 CS                 1
 2 CT                 1
 3 CU                 1
 4 CV                 1
 5 SC                 1
 6 ST                 1
 7 AS                 2
 8 RX                 2
 9 ZL                 2
10 UL                 4
# ℹ 38 more rows

Now let’s look at species by plot. Group abundances by both species and plot id. Now our dataframe will be in wide format which is preferred by the packages we will use throughout this lab.

df <- ratdat::complete %>%
  
  # Filter out rows with blank cells for species_id
  filter(species_id != "") %>%
  
  # Subset dataframe by species_id
  group_by(species_id, plot_id) %>%
  
  # Count number of rows in each subset, create a new column `abundance`
  summarise(abundance = n()) %>%
  
  # Put subsets back together
  ungroup() %>%
  
  # Convert to wide table, we will cover this again later
  pivot_wider(names_from = species_id, values_from = abundance, values_fill = 0)
`summarise()` has grouped output by 'species_id'. You can override using the
`.groups` argument.
head(df)
# A tibble: 6 × 49
  plot_id    AB    AH    AS    BA    CB    CM    CQ    CS    CT    CU    CV
    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1       1     7     7     0     1     0     0     0     0     0     0     0
2       2    14     7     0     1     0     1     1     0     0     0     0
3       3    10     2     0    19     1     3     1     0     0     0     0
4       4     3     2     0     0     1     0     0     0     0     0     0
5       5     2     3     0     4     1     0     2     0     0     0     0
6       6    17     6     0     0     1     0     0     0     1     0     0
# ℹ 37 more variables: DM <int>, DO <int>, DS <int>, DX <int>, NL <int>,
#   OL <int>, OT <int>, OX <int>, PB <int>, PC <int>, PE <int>, PF <int>,
#   PG <int>, PH <int>, PI <int>, PL <int>, PM <int>, PP <int>, PU <int>,
#   PX <int>, RF <int>, RM <int>, RO <int>, RX <int>, SA <int>, SC <int>,
#   SF <int>, SH <int>, SO <int>, SS <int>, ST <int>, SU <int>, UL <int>,
#   UP <int>, UR <int>, US <int>, ZL <int>

Let’s plot for species AB and AH for the first 4 plots

df_AB_AH <- df %>%
  dplyr::select(plot_id, AB, AH) %>%
  filter(plot_id <= 4)
  

ggplot(data = df_AB_AH, aes(x = AB, y = AH)) +
  geom_text(aes(label = plot_id)) +
  labs(x = "AB", y = "AH", title = "Plot IDs positioned by abundances")

Distances

Euclidean Distance

Can you tell by looking at this plot how different the relationship is between these species by site?

We can calculate the distance between plots to quantify how different they are. In general, distances in ecology are primarily used to quantify dissimilarity. The first method is calculating the Euclidean distance between points. We just look at this x, y coordinate plot as a map. Therefore, this distance can be calculated with

\[ \sqrt{( x_{1} \ -\ x_{2})^{2} \ +\ ( y_{1} \ -\ y_{2})^{2}} \]

Calculate the Euclidean distance between plots 1 and 3

x <- df_AB_AH$AB[1] - df_AB_AH$AH[1]
y <- df_AB_AH$AB[3] - df_AB_AH$AH[3]

sqrt(x^2 + y^2)
[1] 8

Bray-Curtis/Sørenson Distance

However, this Euclidean distance can quickly focus heavily on more abundant species over rare species. To prevent this by weighing abundant and rare species evenly, we can calculate the abundances between two sites divided by the total abundances at each site. This is called the Bray-Curtis or Sørenson Distance.

\[ \sum \frac{|x_{ai} \ -\ x_{bi} |}{x_{ai} \ +\ x_{bi}} \]

To help us do so, we will introduce you to the currently most popular community ecology package in R, vegan (Oksanen et al. 2022).

The vegdist() function from the vegan package provides Bray-Curtis distance by default for an entire dataset, so we can filter for plot 1 and 3.

df_AB_AH_vegan <- df_AB_AH %>%
  filter(plot_id == 1 | plot_id == 3) %>%
  select(AB, AH)

vegdist(df_AB_AH_vegan, method="bray")
          1
2 0.3076923

Non-metric Multidimensional Scaling

We have focused on pairs of 2 species, but our dataframe has more than 2 columns. Let’s plot 3 columns (species AB, AH, and BA) for 4 plots.

df_AB_AH_BA <- df %>%
  select(plot_id, AB, AH, BA) %>%
  filter(plot_id <= 4)

# Convert to plotly 3D scatter plot
plot_ly(data = df_AB_AH_BA, x = ~AB, y = ~AH, z = ~BA, type = 'scatter3d', mode = 'markers') %>%
  layout(scene = list(xaxis = list(title = 'AB'),
                      yaxis = list(title = 'AH'),
                      zaxis = list(title = 'BA'),
                      camera = list(
                        eye = list(x = 0, y = 0, z = 2.5),
                        up = list(x = 0, y = 1, z = 0)
                      )))

How can we visualize 4 columns (species AB, AH, BA, and SS) to see their distance?

One technique that attempts to visualize an optimal arrangement is Non-metric Multidimensional Scaling (NMDS). Note that the axes of the plotted results are arbitrary (non-metric) but instead represent the relative weight of relationships.

Once again, vegan can help with this, among other options it can produce NMDS plots by either Euclidean or Bray-Curtis distance.

df_AB_AH_BA_SS <- df %>%
  select(plot_id, AB, AH, BA, SS) %>%
  filter(plot_id <= 4)

nmds_Euc <- metaMDS(df_AB_AH_BA_SS %>% select(-plot_id), distance = "euclidean")
Wisconsin double standardization
Run 0 stress 0 
Run 1 stress 0 
... Procrustes: rmse 0.133155  max resid 0.2017735 
Run 2 stress 8.095724e-05 
... Procrustes: rmse 0.2224001  max resid 0.3350531 
Run 3 stress 0 
... Procrustes: rmse 0.1221689  max resid 0.201466 
Run 4 stress 0 
... Procrustes: rmse 0.1656368  max resid 0.2820223 
Run 5 stress 0 
... Procrustes: rmse 0.1549081  max resid 0.2523481 
Run 6 stress 0 
... Procrustes: rmse 0.1745811  max resid 0.2837019 
Run 7 stress 0 
... Procrustes: rmse 0.141142  max resid 0.2240949 
Run 8 stress 0 
... Procrustes: rmse 0.1253157  max resid 0.188246 
Run 9 stress 0 
... Procrustes: rmse 0.1275168  max resid 0.1783719 
Run 10 stress 0 
... Procrustes: rmse 0.05976498  max resid 0.08892813 
Run 11 stress 7.12166e-05 
... Procrustes: rmse 0.229479  max resid 0.340543 
Run 12 stress 0 
... Procrustes: rmse 0.1754167  max resid 0.2148794 
Run 13 stress 0 
... Procrustes: rmse 0.1259958  max resid 0.1742557 
Run 14 stress 0 
... Procrustes: rmse 0.1935217  max resid 0.316349 
Run 15 stress 0 
... Procrustes: rmse 0.08341071  max resid 0.1386356 
Run 16 stress 0 
... Procrustes: rmse 0.2032435  max resid 0.310633 
Run 17 stress 0 
... Procrustes: rmse 0.1188598  max resid 0.1824606 
Run 18 stress 0 
... Procrustes: rmse 0.184431  max resid 0.3011979 
Run 19 stress 0 
... Procrustes: rmse 0.04545514  max resid 0.06604594 
Run 20 stress 7.539867e-05 
... Procrustes: rmse 0.2283102  max resid 0.3411722 
*** Best solution was not repeated -- monoMDS stopping criteria:
    20: stress < smin
Warning in metaMDS(df_AB_AH_BA_SS %>% select(-plot_id), distance =
"euclidean"): stress is (nearly) zero: you may have insufficient data
scores_plot_df <- as.data.frame(scores(nmds_Euc, display = "sites"))
scores_plot_df$plot_id <- df_AB_AH_BA_SS$plot_id

ggplot(scores_plot_df, aes(x = NMDS1, y = NMDS2)) +
  geom_text(aes(label = plot_id)) +
  labs(title = "NMDS Plot with Euclidean Distance", x = "NMDS_1", y = "NMDS_2") 

# Repeat for Bray-Curtis distance

nmds_Bray <- metaMDS(df_AB_AH_BA_SS %>% select(-plot_id), distance = "bray")
Wisconsin double standardization
Run 0 stress 0 
Run 1 stress 0 
... Procrustes: rmse 0.1231563  max resid 0.2073613 
Run 2 stress 0 
... Procrustes: rmse 0.1339593  max resid 0.1593201 
Run 3 stress 0 
... Procrustes: rmse 0.1699433  max resid 0.1978182 
Run 4 stress 0 
... Procrustes: rmse 0.06628414  max resid 0.09287973 
Run 5 stress 0 
... Procrustes: rmse 0.1668645  max resid 0.1899495 
Run 6 stress 0 
... Procrustes: rmse 0.08747335  max resid 0.128069 
Run 7 stress 0 
... Procrustes: rmse 0.04918992  max resid 0.06570157 
Run 8 stress 0 
... Procrustes: rmse 0.1731014  max resid 0.2026734 
Run 9 stress 0 
... Procrustes: rmse 0.1573814  max resid 0.164574 
Run 10 stress 0 
... Procrustes: rmse 0.07973617  max resid 0.1322914 
Run 11 stress 0 
... Procrustes: rmse 0.06287776  max resid 0.0839981 
Run 12 stress 0 
... Procrustes: rmse 0.1805427  max resid 0.2133046 
Run 13 stress 0 
... Procrustes: rmse 0.08993397  max resid 0.1190303 
Run 14 stress 0 
... Procrustes: rmse 0.1041228  max resid 0.1711877 
Run 15 stress 2.930413e-05 
... Procrustes: rmse 0.08119986  max resid 0.1253618 
Run 16 stress 0 
... Procrustes: rmse 0.03147627  max resid 0.05260101 
Run 17 stress 0 
... Procrustes: rmse 0.07743176  max resid 0.1303473 
Run 18 stress 0 
... Procrustes: rmse 0.08985485  max resid 0.1146704 
Run 19 stress 0 
... Procrustes: rmse 0.120438  max resid 0.1997085 
Run 20 stress 0 
... Procrustes: rmse 0.05740063  max resid 0.07307026 
*** Best solution was not repeated -- monoMDS stopping criteria:
    20: stress < smin
Warning in metaMDS(df_AB_AH_BA_SS %>% select(-plot_id), distance = "bray"):
stress is (nearly) zero: you may have insufficient data
Warning in postMDS(out$points, dis, plot = max(0, plot - 1), ...): skipping
half-change scaling: too few points below threshold
scores_plot_df <- as.data.frame(scores(nmds_Bray, display = "sites"))
scores_plot_df$plot_id <- df_AB_AH_BA_SS$plot_id

ggplot(scores_plot_df, aes(x = NMDS1, y = NMDS2)) +
  geom_text(aes(label = plot_id)) +
  labs(title = "NMDS Plot with Bray-Curtis Distance", x = "NMDS_1", y = "NMDS_2") 

Compare this to our original plot, are there any differences that you notice?

df_AB_AH <- df %>%
  dplyr::select(plot_id, AB, AH) %>%
  filter(plot_id <= 4)
  

ggplot(data = df_AB_AH, aes(x = AB, y = AH)) +
  geom_text(aes(label = plot_id)) +
  labs(x = "AB", y = "AH", title = "Plot IDs positioned by abundances")

NMDS plots can go well beyond 4-dimensions, because of this expandability, we can plot distances for all species in our sample simultaneously to identify which plots may be more different from one another as far as species abundances is concerned.

# Repeat creating NMDS using Bray-Curtis distance for all species

nmds_Bray <- metaMDS(df %>% select(-plot_id), distance = "bray")
Square root transformation
Wisconsin double standardization
Run 0 stress 0.1732437 
Run 1 stress 0.1692261 
... New best solution
... Procrustes: rmse 0.07820723  max resid 0.2409317 
Run 2 stress 0.3762069 
Run 3 stress 0.197087 
Run 4 stress 0.1786437 
Run 5 stress 0.1760508 
Run 6 stress 0.1786437 
Run 7 stress 0.1786437 
Run 8 stress 0.1687115 
... New best solution
... Procrustes: rmse 0.05406268  max resid 0.185574 
Run 9 stress 0.1796872 
Run 10 stress 0.1830521 
Run 11 stress 0.3840954 
Run 12 stress 0.1666806 
... New best solution
... Procrustes: rmse 0.04441907  max resid 0.1409255 
Run 13 stress 0.173811 
Run 14 stress 0.1763763 
Run 15 stress 0.1819952 
Run 16 stress 0.1758261 
Run 17 stress 0.188976 
Run 18 stress 0.1742797 
Run 19 stress 0.1692261 
Run 20 stress 0.1823301 
*** Best solution was not repeated -- monoMDS stopping criteria:
    18: stress ratio > sratmax
     2: scale factor of the gradient < sfgrmin
scores_plot_df <- as.data.frame(scores(nmds_Bray, display = "sites"))
scores_plot_df$plot_id <- df$plot_id

ggplot(scores_plot_df, aes(x = NMDS1, y = NMDS2)) +
  geom_text(aes(label = plot_id)) +
  labs(title = "NMDS Plot with Bray-Curtis Distance", x = "NMDS_1", y = "NMDS_2") 

What hypotheses can we make based on this plot?

  • We can make various hypotheses, such as that plots 10 and 23 are more similar to one another than to the other plots regrading species abundances

NMDS is an example of an ordination technique, which allow us to arrange multidimensional data and reduce it to low-dimensional space based on their similarities or dissimilarities. This is very useful for visualizing and analyzing complex data. Other examples include:

  • Principal Component Analysis (PCA) is primarily used for uncovering patterns in data by reducing its dimensions linearly, without considering external variables.
  • Canonical Correspondence Analysis (CCA) to explore the relationships between two sets of variables, particularly when one set is hypothesized to influence the other.
  • Principal Coordinates Analysis (PCoA) focuses on preserving the pairwise distances between observations when data is reduced to a lower-dimensional space.

All of these ordination methods do something more or less like this following visualization of a simple PCA, they look at multivariate point cloud from a certain angle and take a 2D slice through that plane:

Richness

Now that we have explored abundance, let’s return to richness as a biodiversity metric. Do you remember the three ways we have talked about measuring diversity this semester?

Gamma Diversity

What is the species richness of our region for ratdat::complete?

gamma <-  ratdat::complete %>% 
          filter(species_id != "") %>%
          select(species_id) %>%
          n_distinct()

gamma
[1] 48
  • It’s equal to the number or rows, so 48.

This total count of species richness across a region is referred to as \(\gamma\) diversity.

Alpha Diversity

Similarly, \(\alpha\) diversity is the richness of a local area or a single sample.

Calculate the \(\alpha\) diversity of plots 7 and 12

alpha_7 <-  ratdat::complete %>% 
            filter(species_id != "") %>%
            filter(plot_id == 7) %>%
            select(species_id) %>%
            n_distinct()

alpha_12 <- ratdat::complete %>% 
            filter(species_id != "") %>%
            filter(plot_id == 12) %>%
            select(species_id) %>%
            n_distinct()

alpha_7
[1] 21
alpha_12
[1] 25

How similar are these plots? How distinct are they? \(\alpha\) diversity can start to answer these questions, but doesn’t provide too clear of an answer.

Calculating \(\beta\) diversity can begin to help us explore their differences.

Beta Diversity

\(\beta\) diversity can be defined as the number of species in a region not observed in sample or the turnover in species composition among sites. Therefore both \(\gamma\) (regional) and \(\alpha\) (site) diversity are directly related to \(\beta\) diversity.

\(\beta\) diversity is often calculated via additive partitioning.

\[ \beta \ =\ \gamma \ -\ \overline{\alpha } \]

Where \(\overline{\alpha }\) is the average \(\alpha\) diversity of the sample. Here, \(\beta\) diversity is the average number of species not found in a sample even though they are found in a region.

It is also commonly calculated using multiplicative partitioning:

\[ \beta \ =\ \frac{\gamma }{\overline{\alpha }} \]

This metric for turnover aids in determining how many unique assemblages exist in a region.

As with any step of model selection, you must be intentional about which version of \(\beta\) diversity is applicable to your question and sample.

We will calculate the beta diversity of sites 7 and 12. Once again, we will be using the vegan package to help with this.

First, we will need to reformat the data into the format that vegan prefers called a “Community data matrix,” with species names across the X-axis of the dataframe, sampling event down the Y-axis, and abundances where they meet. We will widen the dataframe by species name. To do so, we will use the tidyR pivot_wider() function.

comm_data_matrix <- ratdat::complete %>%
  
  # Filter out blank species_ids
  filter(species_id != "") %>%

  # Add a count of one for each sample
  mutate(count = 1) %>%

  # Widen based on species
  pivot_wider(names_from = species_id, values_from = count, values_fill = 0) %>%

  # Create a year_month_day column
  mutate(year_month_day = sprintf('%04d_%02d_%02d', year, month, day)) %>%

  # Select only for year_month_day, plot_id, and species columns
  select(-year, -month, -day, -record_id, -sex, -hindfoot_length, -weight, -genus, -species, -taxa, -plot_type) %>%
  
  # Combine unique year_month_days/plot_id combination and sum species counts
  unite(year_month_day, year_month_day) %>%
  group_by(year_month_day, plot_id) %>%
  summarise_all(sum) %>%
  ungroup()

head(comm_data_matrix)
# A tibble: 6 × 50
  year_month_day plot_id    NL    DM    PF    PE    DS    PP    SH    OT    DO
  <chr>            <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1977_07_16           1     0     2     1     0     0     0     0     0     0
2 1977_07_16           2     1     1     0     1     0     1     0     0     0
3 1977_07_16           3     1     2     0     0     1     0     0     0     0
4 1977_07_16           4     0     1     1     0     0     0     0     0     0
5 1977_07_16           5     0     0     0     0     1     0     0     0     0
6 1977_07_16           6     0     1     1     0     0     0     0     0     0
# ℹ 39 more variables: OX <dbl>, SS <dbl>, OL <dbl>, RM <dbl>, SA <dbl>,
#   PM <dbl>, AH <dbl>, DX <dbl>, AB <dbl>, CB <dbl>, CM <dbl>, CQ <dbl>,
#   RF <dbl>, PC <dbl>, PG <dbl>, PH <dbl>, PU <dbl>, CV <dbl>, UR <dbl>,
#   UP <dbl>, ZL <dbl>, UL <dbl>, CS <dbl>, SC <dbl>, BA <dbl>, SF <dbl>,
#   RO <dbl>, AS <dbl>, SO <dbl>, PI <dbl>, ST <dbl>, CU <dbl>, SU <dbl>,
#   RX <dbl>, PB <dbl>, PL <dbl>, PX <dbl>, CT <dbl>, US <dbl>

To calculate \(\overline{\alpha }\), use the vegan function specnumber():

alpha_bar <- mean(specnumber(df))

This will need to be done after the dataframe has only species columns, because that is the format vegan prefers.

With this information, calculate beta diversity for sites 7 and 12.

# Wrangle data

df_7 <- comm_data_matrix %>%
  filter(plot_id == 7) %>%
  
  # Vegan only likes species columns
  select(-year_month_day, -plot_id)

  

df_12 <- comm_data_matrix %>%
  filter(plot_id == 12) %>%
  select(-year_month_day, -plot_id)
  
  
# Calculate alpha bar and gamma diversity

alpha_bar_7 <- mean(specnumber(df_7))
gamma_7 <- dim(df_7)[2]

alpha_bar_12 <- mean(specnumber(df_12))
gamma_12 <- dim(df_12)[2]


# Calculate additive beta diversity

df_7_beta_add <- gamma_7 - alpha_bar_7
df_12_beta_add <- gamma_12 - alpha_bar_12


# Calculate multiplicative beta diversity

df_7_beta_mult <- gamma_7 / alpha_bar_7
df_12_beta_mult <- gamma_12 / alpha_bar_12

  
paste("alpha_bar_7", alpha_bar_7)
[1] "alpha_bar_7 1.81018518518519"
paste("df_7_beta_add", df_7_beta_add) 
[1] "df_7_beta_add 46.1898148148148"
paste("df_7_beta_mult", df_7_beta_mult)
[1] "df_7_beta_mult 26.5166240409207"
paste("alpha_bar_12", alpha_bar_12)
[1] "alpha_bar_12 3.8494983277592"
paste("df_12_beta_add", df_12_beta_add)
[1] "df_12_beta_add 44.1505016722408"
paste("df_12_beta_mult", df_12_beta_mult)
[1] "df_12_beta_mult 12.4691572545613"

What do these results imply?

  • Additive partitioned \(\beta\) diversity is similar between the sites, therefore similar average number of species not found in these sites similar

  • However, multiplicative partitioned \(\beta\) diversity is different between the sites, so the turnover compared to the region different

  • Because \(\beta \ >\ \overline{\alpha }\), each plot misses more species than it finds.

Diversity Indices

Let’s bring our understanding together to quantify diversity using indices. Diversity is challenging to quantify because it is difficult to agree on a definition. For example, we could attempt to quantify biodiversity with any of the following:

  • Probability of encountering a species at random
  • Uncertainty of possible community states
  • Variance in species composition between sites

Species Richness Index

The most common biodiversity index is the Species Richness Index, counting the number of unique species in a community. We already completed this task above, calling it \(\gamma\) diversity for our system:

species_richness <- ratdat::complete %>% 
                    filter(species_id != "") %>%
                    select(species_id) %>%
                    n_distinct()

species_richness
[1] 48

However, it is highly sensitive to variations in sampling effort because it treats all species equally, weighing rare and common species equally despite the greater likelihood of rare species going undetected.

It does not account for evenness: the fact that some species in the community are common and others are rare.

Source: davidzeleny.net

What do you notice about the examples above?

  • They have equal species richness, however the communities are very different based on evenness.

Shannon and Simpson Diversity Indices

The Shannon(Shannon 1948) and Simpson Diversity Indices(Simpson 1949), both special cases of the Hill’s Number(Hill 1973), are very popular biodiversity indices because they focus on both richness and evenness.

\[ \begin{array}{l} Shannon-Wiener\ Diversity\ =-\sum _{i\ =\ i}^{S} p_{i}\ln p_{i}\\ \\ Simpson\ Diversity\ =\frac{1}{\sum _{i\ =\ 1}^{S} \ p_{i}^{2}}\\ \\ p_{i} \ =\ relative\ abundance\ of\ species\ i\\ S\ =\ total\ number\ of\ species \end{array} \]

What’s in a name?

You will likely see the Shannon Index referred to in different ways throughout the literature because of articles/books that were published on the topic by three different mathematicians over the course of a couple of years. Here is a quick guide to the alternative names you will encounter:

As richness and evenness increase, these indices increase. It is important to remember that Simpson’s Diversity Index emphasizes common species, while the Shannon Diversity Index is biased towards rare species. Here is how you can interpret these results:

Shannon’s Index interpretation:

  • Shannon’s Index is based in information theory and represents the uncertainty that we will be able to predict which species will be randomly drawn from a community.
  • Values of \(H\) (Shannon Entropy) are usually between 1.5-3.5 (the units are bits of information)
  • The maximum value of \(H\) represents a perfectly even community

Simpson’s Index interpretation:

  • Simpson’s Index represents the probability that two randomly selected individuals are from the same species.
  • Simpson’s Index decreases with increasing richness, because of this Gini-Simpson index (\(1 - D\)) is often used to make value more comparable with other indices.
  • The values of \(D\) range from 0 to 1

Because Simpson’s/Gini-Simpson’s indices refer more directly to richness than evenness, we can also calculate Simpson’s Evenness (“equitability”) where \(S\) is the number of species:

\[ \frac{1}{DS} \]

Now let’s calculate these values for our dataset. Thanks once again to the vegan package, these equations are readily available through the function diversity(). Here is the ?vegan::diversity() help page:

Here is another nice summary as well explaining what exactly the diversity() function is actually calculating:

Source: davidzeleny.net

Let’s assume plots 1-8 and 17-24 represent two distinct regions. Wrangle our dataframe comm_data_matrix to combine plots (hint, use the summarize(sum) function) and remove the date column. Then compare the Gini-Simpson Index, Simpson’s Evenness, and Shannon Index values for plots 1-8 versus plots 17-24.

# Wrangle data
df_1_8 <- comm_data_matrix %>%
  filter(plot_id <= 8) %>%
  select(-year_month_day, -plot_id) %>%
  summarise_all(sum)

df_17_24 <- comm_data_matrix %>%
  filter(plot_id >= 16) %>%
  select(-year_month_day, -plot_id) %>%
  summarise_all(sum)


# Calculate Gini-Simpson's diversity
diversity(df_1_8, index = "simpson")
[1] 0.8690882
diversity(df_17_24, index = "simpson")
[1] 0.8897126
# Calculate Simpson's Evenness
Gs_1_8 <- diversity(df_1_8, index = "simpson")
Gs_17_24 <- diversity(df_17_24, index = "simpson")

D_1_8 <- 1 - Gs_1_8
D_17_24 <- 1 - Gs_17_24

1 / (D_1_8 * ncol(df_1_8))
[1] 0.1591402
1 / (D_17_24 * ncol(df_17_24))
[1] 0.1889005
# Calculate Shannon's diversity:
diversity(df_1_8, index = "shannon")
[1] 2.377345
diversity(df_17_24, index = "shannon")
[1] 2.513817

How would you interpret these results?

  • Plots 1-8 and plots 17-24 have a relatively similar value for the Gini-Simpson’s Index, with plots 17-24 having slightly higher richness overall
  • Plots 1-8 and plots 17-24 have a relatively similar value for the Simpson’s Evenness, with plots 17-24 being slightly more even
  • In agreement with the Simpson’s Evenness value, when emphasizing rare species using the Shannon’s Diversity Index, plots 1-8 and plots 17-24 have a relatively similar value for the Shannons’s Diversity Index (Shannon’s Entropy), with plots 17-24 being slightly more even. Overall, this is a relatively high value of \(H\) indicating relatively high evenness
  • In summary, these sets of plots seem to be similar when looking at evenness

References

Hill, Mark O. 1973. “Diversity and Evenness: A Unifying Notation and Its Consequences.” Ecology 54 (2): 427–32.
Mittelbach, Gary G, and Brian J McGill. 2019. Community Ecology. Oxford University Press.
Oksanen, Jari, Gavin L. Simpson, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O’Hara, et al. 2022. “Vegan: Community Ecology Package.” https://CRAN.R-project.org/package=vegan.
Shannon, Claude Elwood. 1948. “A Mathematical Theory of Communication.” The Bell System Technical Journal 27 (3): 379–423.
Simpson, Edward H. 1949. “Measurement of Diversity.” Nature 163 (4148): 688–88.
Stevens, Martin Henry Hoffman. 2009. A Primer of Ecology with r. Springer.
Zeleny, D. 2020. “Analysis of Community Ecology Data in r. DavidZeleny. Net.” https://www.davidzeleny.net/anadat-r/doku.php/en:div-ind_r.