Lab 8 - Community Assembly

Authors

Charles Lehnen

Cameron Egan

Melissa Guzman

Introduction to Community Assembly

“Only a fraction of the forms that could theoretically do so actually form a community at any one time.” - Charles Elton (1950)(elton1950ecology?)

Why aren’t all species everywhere?
Why are local scale communities smaller (lower richness) than regional species pools?
Why are there so many species?

One of the most central challenges in ecology is pinpointing what determines the number, kinds, and abundances of species found in a community.

The study of community assembly attempts to describe the complex processes that determine the composition and abundance of species within ecological communities.

Filters

In the traditional view of community assembly, there are three broad categories of processes/filters that determine colonization:

  1. Dispersal

  2. Abiotic/Environmental Filters (e.g. temperature, humidity, precipitation, geological features, nutrient availability, etc.)

  3. Biotic/Interaction Filters (e.g. competition, predation, parasitism, disease, and mutualism)

As you can readily see, community assembly pulls together many of the topics we have covered this semester. As some additional examples, island biogeography theory and neutral theory emphasize a dispersal filter. Meanwhile co-existence theory involves biotic filters. We discussed abiotic filters in the context of population growth. We also learned about biotic filters in the context of density-dependence, competition, and predator/prey models.

These processes are traditionally viewed as sequential filters:

Source: Mittelbach & McGill 2019(Mittelbach and McGill 2019)

However, it is more accurate to remember that in reality these three processes can act out of order and simultaneously. Imagine the impact of differing R* values in a system with nutrients that fluctuate seasonally or even geographically.

Community Assembly Lab

The dataset for the upcoming lab originates from “pH as an Eco-evolutionary Driver of Priority Effects” by Chappell et al. (2022). It details field surveys on Diplacus aurantiacus nectar across the San Francisco Peninsula and subsequent lab experiments which investigated the role of pH in microbial community dynamics. The study showcased that the microbial dominance within nectar shifts between bacteria (specifically Acinetobacter nectaris) and yeasts (specifically Metschnikowia reukaufii).

Bush Monkeyflower (Mimulus aurantiacus). Source: Wikimedia

Using read.csv(), load the dataset 2015_survey_samples_edit_cfu_master.csv as the df observed_data. This dataset consists of the field collected data.

meta <- read.csv('Lab_8_data/metadata_2015_survey_samples_edit_cfu_master.csv')

#View(meta1)
observed_data <- read.csv('Lab_8_data/2015_survey_samples_edit_cfu_master.csv')

Explore the dataset here and by looking at the metadata.

head(observed_data)
  number Plant_ID Flower_ID Date_collected Date_plated Replicate Nectar_cm
1      1      LH1         1        6/23/15     6/24/15         1       0.1
2      2      LH1         2        6/23/15     6/24/15         2       1.4
3      3      LH1         3        6/23/15     6/24/15         3       3.1
4      4      LH1         4        6/23/15     6/24/15         4       2.3
5      5      LH1         5        6/23/15     6/24/15         5       0.3
6      6     LH1          6        6/23/15     6/24/15         6       0.9
  Nectar_mm Nectar_vol Stigma_closed Fcfu1 Fcfu2 Fcfu3 Fcfu4 Fcfu5 Fcfu6
1         1  0.1210654           Yes     0     0     0     0     0     0
2        14  1.6949153           Yes     0     0     0     0     0     0
3        31  3.7530266           Yes    57     0     0     0     0     0
4        23  2.7845036           Yes     1     0     0     0     0     0
5         3  0.3631961           Yes     0     0     0     0     0     0
6         9  1.0895884           Yes     0     0     0     0     0     0
  F_nontarget    Fcfu_avg FCFU_total     FCFU_ul log.FCFU.1. Bcfu1 Bcfu2 Bcfu3
1        <NA>           0          0           0           0     0     0     0
2        <NA>           0          0           0           0    38     1     0
3        <NA>         9.5         57 171.0607382 2.235681782    11     1     0
4        <NA> 0.166666667          1 3.836009923 0.684487183   124     0     0
5        <NA>           0          0           0           0     1     0     0
6        <NA>           0          0           0           0     0     0     0
  Bcfu4 Bcfu5 Bcfu6    Bcfu_avg BCFU_total      BCFUul log.BCFU.1.  X X.1 X.2
1     0     0     0           0          0           0           0 NA  NA  NA
2     0     0     0         6.5         39 6915.391539 3.839879571 NA  NA  NA
3     0     0     0           2         12 1079.411167 3.033589064 NA  NA  NA
4     0     0     0 20.66666667        124 14257.11267 4.154062042 NA  NA  NA
5     0     0     0 0.166666667          1 761.4761476 2.882226262 NA  NA  NA
6     0     0     0           0          0           0           0 NA  NA  NA
  X.3 X.4 X.5 X.6 X.7 X.8 X.9
1  NA  NA  NA  NA  NA  NA  NA
2  NA  NA  NA  NA  NA  NA  NA
3  NA  NA  NA  NA  NA  NA  NA
4  NA  NA  NA  NA  NA  NA  NA
5  NA  NA  NA  NA  NA  NA  NA
6  NA  NA  NA  NA  NA  NA  NA
length(unique(observed_data$Plant_ID))
[1] 148
observed_data %>%
  distinct(Plant_ID) %>%
  nrow()
[1] 148
observed_data %>%
  distinct(Plant_ID, Flower_ID) %>%
  nrow()
[1] 1154
summary(observed_data)
     number         Plant_ID           Flower_ID      Date_collected    
 Min.   :   1.0   Length:1157        Min.   :   1.0   Length:1157       
 1st Qu.: 288.0   Class :character   1st Qu.: 289.0   Class :character  
 Median : 576.0   Mode  :character   Median : 577.0   Mode  :character  
 Mean   : 576.1                      Mean   : 576.6                     
 3rd Qu.: 864.0                      3rd Qu.: 864.0                     
 Max.   :1152.0                      Max.   :1152.0                     
 NA's   :4                           NA's   :4                          
 Date_plated          Replicate       Nectar_cm        Nectar_mm     
 Length:1157        Min.   :1.000   Min.   :0.0100   Min.   : 0.000  
 Class :character   1st Qu.:2.000   1st Qu.:0.3000   1st Qu.: 3.000  
 Mode  :character   Median :4.000   Median :0.7000   Median : 7.000  
                    Mean   :4.497   Mean   :0.9831   Mean   : 9.823  
                    3rd Qu.:6.000   3rd Qu.:1.4000   3rd Qu.:14.000  
                    Max.   :8.000   Max.   :5.6000   Max.   :56.000  
                    NA's   :4       NA's   :5        NA's   :4       
   Nectar_vol     Stigma_closed          Fcfu1             Fcfu2        
 Min.   :0.0000   Length:1157        Min.   :  0.000   Min.   :  0.000  
 1st Qu.:0.3632   Class :character   1st Qu.:  0.000   1st Qu.:  0.000  
 Median :0.8475   Mode  :character   Median :  0.000   Median :  0.000  
 Mean   :1.1892                      Mean   :  8.441   Mean   :  9.083  
 3rd Qu.:1.6949                      3rd Qu.:  3.000   3rd Qu.:  0.750  
 Max.   :6.7797                      Max.   :576.000   Max.   :752.000  
 NA's   :4                           NA's   :4         NA's   :879      
     Fcfu3             Fcfu4             Fcfu5         Fcfu6    
 Min.   : 0.0000   Min.   : 0.0000   Min.   :0     Min.   :0    
 1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.:0     1st Qu.:0    
 Median : 0.0000   Median : 0.0000   Median :0     Median :0    
 Mean   : 0.4836   Mean   : 0.0711   Mean   :0     Mean   :0    
 3rd Qu.: 0.0000   3rd Qu.: 0.0000   3rd Qu.:0     3rd Qu.:0    
 Max.   :43.0000   Max.   :13.0000   Max.   :0     Max.   :0    
 NA's   :944       NA's   :960       NA's   :965   NA's   :965  
 F_nontarget          Fcfu_avg          FCFU_total          FCFU_ul         
 Length:1157        Length:1157        Length:1157        Length:1157       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
 log.FCFU.1.           Bcfu1               Bcfu2              Bcfu3       
 Length:1157        Length:1157        Min.   :    0.00   Min.   : 0.000  
 Class :character   Class :character   1st Qu.:    1.00   1st Qu.: 0.000  
 Mode  :character   Mode  :character   Median :    2.00   Median : 0.000  
                                       Mean   :   31.68   Mean   : 1.781  
                                       3rd Qu.:    5.00   3rd Qu.: 1.000  
                                       Max.   :10000.00   Max.   :48.000  
                                       NA's   :728        NA's   :1002    
     Bcfu4            Bcfu5          Bcfu6        Bcfu_avg        
 Min.   :0.0000   Min.   :0      Min.   :0      Length:1157       
 1st Qu.:0.0000   1st Qu.:0      1st Qu.:0      Class :character  
 Median :0.0000   Median :0      Median :0      Mode  :character  
 Mean   :0.2991   Mean   :0      Mean   :0                        
 3rd Qu.:0.0000   3rd Qu.:0      3rd Qu.:0                        
 Max.   :6.0000   Max.   :0      Max.   :0                        
 NA's   :1050     NA's   :1058   NA's   :1057                     
   BCFU_total         BCFUul          log.BCFU.1.           X          
 Min.   :    0.0   Length:1157        Length:1157        Mode:logical  
 1st Qu.:    0.0   Class :character   Class :character   NA's:1157     
 Median :    3.0   Mode  :character   Mode  :character                 
 Mean   :  375.7                                                       
 3rd Qu.:   18.0                                                       
 Max.   :20000.0                                                       
 NA's   :4                                                             
   X.1            X.2            X.3            X.4            X.5         
 Mode:logical   Mode:logical   Mode:logical   Mode:logical   Mode:logical  
 NA's:1157      NA's:1157      NA's:1157      NA's:1157      NA's:1157     
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
   X.6            X.7            X.8            X.9         
 Mode:logical   Mode:logical   Mode:logical   Mode:logical  
 NA's:1157      NA's:1157      NA's:1157      NA's:1157     
                                                            
                                                            
                                                            
                                                            
                                                            
Describe the dataset:
  • The dataset includes unique plant and flower IDs across sites.
  • Each row represents a unique flower. Along with other microbe measurement data, it includes the number of different species of bacteria (Bcfus, or bacterial colony forming units) and fungi (Fcfu) in each plant/flower. Also the amount of nectar is detailed.
  • 148 plants and 1,154 flowers were surveyed.
  • Oddly, the site numbers or locations are not included.
  • Also oddly, the units for vol are not given in the metadata
What needs to be cleaned from the dataset?
  • There are whole rows with mostly NAs
  • There are NAs present in the Bcfu and Fcfu columns, which we can convert to 0s for our purposes
  • There is a plant named “75”
  • Some of the columns (like Fcfu_avg, FCFU_total, Bcfu_avg) are character columns and should be numeric
  • There are a bunch of empty x columns we should remove, probably from the formatting of the original CSV files
  • The amount of fungi/bacteria present should depend on the volume of nectar present, so let’s create new columns for fungal/bacterial concentration
  • Also, for the purpose of this exercise let’s just focus on total cfus, so let’s remove the other columns for the more specific morphologies

Clean the dataset:

# Clean our dataset
observed_data_clean <- observed_data %>%
  select(-number, -Date_plated, -Replicate, -Nectar_cm, 
         -Nectar_mm, -F_nontarget, -FCFU_ul, -`log.FCFU.1.`, -BCFUul, 
         -`log.BCFU.1.`, -starts_with("X")) %>%
  
  mutate(
    FCFU_total = as.numeric(FCFU_total),
    BCFU_total = as.numeric(BCFU_total),
    Bcfu1 = as.numeric(Bcfu1),
    Fcfu_avg = as.numeric(Fcfu_avg),
    Bcfu_avg = as.numeric(Bcfu_avg),
    Bcfu_conc = BCFU_total / Nectar_vol,
    Fcfu_conc = FCFU_total / Nectar_vol,
    Source = "Observed", # we will use this column later
    site_name = stringr::str_extract(Plant_ID, "^[A-Z]+") # this will be useful for the homework
  ) %>%
  
  replace_na(list(FCFU_total = 0, BCFU_total = 0, Fcfu_avg = 0, Bcfu_avg = 0, Nectar_vol = 0, Fcfu_conc = 0, Bcfu_conc = 0)) %>%
  
  filter(Plant_ID != "" | Plant_ID != "75")

summary(observed_data_clean)
   Plant_ID           Flower_ID      Date_collected       Nectar_vol    
 Length:1157        Min.   :   1.0   Length:1157        Min.   :0.0000  
 Class :character   1st Qu.: 289.0   Class :character   1st Qu.:0.3632  
 Mode  :character   Median : 577.0   Mode  :character   Median :0.8475  
                    Mean   : 576.6                      Mean   :1.1851  
                    3rd Qu.: 864.0                      3rd Qu.:1.6949  
                    Max.   :1152.0                      Max.   :6.7797  
                    NA's   :4                                           
 Stigma_closed          Fcfu1             Fcfu2             Fcfu3        
 Length:1157        Min.   :  0.000   Min.   :  0.000   Min.   : 0.0000  
 Class :character   1st Qu.:  0.000   1st Qu.:  0.000   1st Qu.: 0.0000  
 Mode  :character   Median :  0.000   Median :  0.000   Median : 0.0000  
                    Mean   :  8.441   Mean   :  9.083   Mean   : 0.4836  
                    3rd Qu.:  3.000   3rd Qu.:  0.750   3rd Qu.: 0.0000  
                    Max.   :576.000   Max.   :752.000   Max.   :43.0000  
                    NA's   :4         NA's   :879       NA's   :944      
     Fcfu4             Fcfu5         Fcfu6        Fcfu_avg      
 Min.   : 0.0000   Min.   :0     Min.   :0     Min.   :  0.000  
 1st Qu.: 0.0000   1st Qu.:0     1st Qu.:0     1st Qu.:  0.000  
 Median : 0.0000   Median :0     Median :0     Median :  0.000  
 Mean   : 0.0711   Mean   :0     Mean   :0     Mean   :  6.705  
 3rd Qu.: 0.0000   3rd Qu.:0     3rd Qu.:0     3rd Qu.:  2.333  
 Max.   :13.0000   Max.   :0     Max.   :0     Max.   :576.000  
 NA's   :960       NA's   :965   NA's   :965                    
   FCFU_total           Bcfu1             Bcfu2              Bcfu3       
 Min.   :   0.000   Min.   :    0.0   Min.   :    0.00   Min.   : 0.000  
 1st Qu.:   0.000   1st Qu.:    0.0   1st Qu.:    1.00   1st Qu.: 0.000  
 Median :   0.000   Median :    2.0   Median :    2.00   Median : 0.000  
 Mean   :   8.738   Mean   :  366.5   Mean   :   31.68   Mean   : 1.781  
 3rd Qu.:   1.816   3rd Qu.:   16.0   3rd Qu.:    5.00   3rd Qu.: 1.000  
 Max.   :1394.673   Max.   :11136.0   Max.   :10000.00   Max.   :48.000  
                    NA's   :13        NA's   :728        NA's   :1002    
     Bcfu4            Bcfu5          Bcfu6         Bcfu_avg      
 Min.   :0.0000   Min.   :0      Min.   :0      Min.   :    0.0  
 1st Qu.:0.0000   1st Qu.:0      1st Qu.:0      1st Qu.:    0.0  
 Median :0.0000   Median :0      Median :0      Median :    2.0  
 Mean   :0.2991   Mean   :0      Mean   :0      Mean   :  284.7  
 3rd Qu.:0.0000   3rd Qu.:0      3rd Qu.:0      3rd Qu.:   10.5  
 Max.   :6.0000   Max.   :0      Max.   :0      Max.   :10000.0  
 NA's   :1050     NA's   :1058   NA's   :1057                    
   BCFU_total        Bcfu_conc          Fcfu_conc          Source         
 Min.   :    0.0   Min.   :    0.00   Min.   :  0.000   Length:1157       
 1st Qu.:    0.0   1st Qu.:    0.00   1st Qu.:  0.000   Class :character  
 Median :    3.0   Median :    4.13   Median :  0.000   Mode  :character  
 Mean   :  374.4   Mean   :  672.47   Mean   :  6.665                     
 3rd Qu.:   18.0   3rd Qu.:   20.65   3rd Qu.:  3.000                     
 Max.   :20000.0   Max.   :82600.00   Max.   :576.000                     
                                                                          
  site_name        
 Length:1157       
 Class :character  
 Mode  :character  
                   
                   
                   
                   

Dispersal

Now that we have loaded and cleaned our dataset, let’s simulate dispersal if there was no filtering. We will simulate the dispersal being uniformly (not randomly) distributed because for dispersal that would be closest to a null hypothesis in most situations.

Source: Wikipedia


Let’s simulate our data as if it was uniformly distributed across flowers

simulated_data <- observed_data_clean[, c("Plant_ID", "Flower_ID", "Stigma_closed", "Nectar_vol")]

# Based on the max value from our data summary above
cfu_count_range <- 0:10000

# Simulate uniform CFU counts for fungal and bacterial samples
for(i in 1:6){
  simulated_data[[paste0('Fcfu', i)]] <- sample(cfu_count_range, nrow(simulated_data), replace = TRUE)
  simulated_data[[paste0('Bcfu', i)]] <- sample(cfu_count_range, nrow(simulated_data), replace = TRUE)
}

head(simulated_data)
  Plant_ID Flower_ID Stigma_closed Nectar_vol Fcfu1 Bcfu1 Fcfu2 Bcfu2 Fcfu3
1      LH1         1           Yes  0.1210654  5926  2355  2140  5161  6499
2      LH1         2           Yes  1.6949153  6059  5822  1346  1409  7453
3      LH1         3           Yes  3.7530266  5345  9047  8453  4788  4347
4      LH1         4           Yes  2.7845036  3217  4544  1014  1471  2783
5      LH1         5           Yes  0.3631961  4756   206  4777  4575  5986
6     LH1          6           Yes  1.0895884  6667   499  4909  7293  5058
  Bcfu3 Fcfu4 Bcfu4 Fcfu5 Bcfu5 Fcfu6 Bcfu6
1  4809  4166  1721  6373   907  7697  1771
2   286  8672  7330  8382   595  2015  2310
3  9165  2811  8526  8740  8224  6192  3404
4  9488  1028  2898  6346  3106  6256  9278
5  9494   134  4596  5357  5562  2016  5287
6  5326  5025  3765  3377  2041  3359  6604
# Calculate other columns from our orignal dataset
simulated_data <- simulated_data %>%
  mutate(
    FCFU_total = rowSums(across(starts_with("Fcfu"))),
    BCFU_total = rowSums(across(starts_with("Bcfu"))),
    Fcfu_avg = FCFU_total / 6,
    Bcfu_avg = BCFU_total / 6,
    Fcfu_conc = FCFU_total / Nectar_vol,
    Bcfu_conc = BCFU_total / Nectar_vol
  )

simulated_data <- simulated_data %>%
  select(
    -Fcfu1, -Bcfu1, 
    -Fcfu2, -Bcfu2, 
    -Fcfu3, -Bcfu3, 
    -Fcfu4, -Bcfu4, 
    -Fcfu5, -Bcfu5, 
    -Fcfu6, -Bcfu6
  ) %>%
  mutate(Source = "Simulated")

head(simulated_data)
  Plant_ID Flower_ID Stigma_closed Nectar_vol FCFU_total BCFU_total Fcfu_avg
1      LH1         1           Yes  0.1210654      32801      16724 5466.833
2      LH1         2           Yes  1.6949153      33927      17752 5654.500
3      LH1         3           Yes  3.7530266      35888      43154 5981.333
4      LH1         4           Yes  2.7845036      20644      30785 3440.667
5      LH1         5           Yes  0.3631961      23026      29720 3837.667
6     LH1          6           Yes  1.0895884      28395      25528 4732.500
  Bcfu_avg  Fcfu_conc Bcfu_conc    Source
1 2787.333 270936.261 138140.24 Simulated
2 2958.667  20016.930  10473.68 Simulated
3 7192.333   9562.415  11498.45 Simulated
4 5130.833   7413.889  11055.83 Simulated
5 4953.333  63398.253  81829.07 Simulated
6 4254.667  26060.300  23429.03 Simulated

Let’s plot our observed versus simulated data:

dim(observed_data_clean)
[1] 1157   25
dim(simulated_data)
[1] 1157   11
names(observed_data_clean)
 [1] "Plant_ID"       "Flower_ID"      "Date_collected" "Nectar_vol"    
 [5] "Stigma_closed"  "Fcfu1"          "Fcfu2"          "Fcfu3"         
 [9] "Fcfu4"          "Fcfu5"          "Fcfu6"          "Fcfu_avg"      
[13] "FCFU_total"     "Bcfu1"          "Bcfu2"          "Bcfu3"         
[17] "Bcfu4"          "Bcfu5"          "Bcfu6"          "Bcfu_avg"      
[21] "BCFU_total"     "Bcfu_conc"      "Fcfu_conc"      "Source"        
[25] "site_name"     
names(simulated_data)
 [1] "Plant_ID"      "Flower_ID"     "Stigma_closed" "Nectar_vol"   
 [5] "FCFU_total"    "BCFU_total"    "Fcfu_avg"      "Bcfu_avg"     
 [9] "Fcfu_conc"     "Bcfu_conc"     "Source"       
combined_data <- full_join(observed_data_clean, simulated_data)
Joining with `by = join_by(Plant_ID, Flower_ID, Nectar_vol, Stigma_closed,
Fcfu_avg, FCFU_total, Bcfu_avg, BCFU_total, Bcfu_conc, Fcfu_conc, Source)`
dim(combined_data)
[1] 2314   25
ggplot(combined_data, aes(x = Source, y = Fcfu_conc, color = Source)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  scale_y_log10() +
  labs(title = "Fungal CFU Concentration (per vol of nectar)",
       y = "Concentration") 
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
log-10 transformation introduced infinite values.
Warning: Removed 731 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Remember that it is crucial to always show your actual data, that is why I included points in our plot. Note that I jittered the points for increased visibility for many points versus geom_point(). I also set geom_boxplot() to outlier.shape = NA so that geom_boxplot() would not also plot the points.

Now repeat for the bacterial data:

ggplot(combined_data, aes(x = Source, y = Bcfu_conc, color = Source)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  scale_y_log10() +
  labs(title = "Bacterial CFU Concentration (per vol of nectar)",
       y = "Concentration") 

What do you observe?
  • It appears the actual dispersal of microbes is dramatically different than if it was just uniformly distributed. This suggests dispersal could be a filter affecting the assembly of these microbe communities.

If we had geospatial data for our dataset, we could do a similar simple analysis. Here I am simulating an artificial “observed” population that is not uniformly distributed along with a simulated distribution that is uniformly distributed.

# Simulating "observed" data with specific lat-lon distribution patterns
observed_data <- data.frame(
  Plant_ID = 1:100,
  lat = rnorm(100, mean = 34, sd = 0.1), # Simulating around a central latitude
  lon = rnorm(100, mean = -118, sd = 0.1), # Simulating around a central longitude
  Source = 'Observed'
)

# Simulating "simulated" data with a uniform distribution across the given range
simulated_data <- data.frame(
  Plant_ID = 1:100,
  lat = runif(100, min = 33.8, max = 34.2), # Uniform distribution within a latitudinal range
  lon = runif(100, min = -118.2, max = -117.8), # Uniform distribution within a longitudinal range
  Source = 'Simulated'
)

combined_data <- rbind(observed_data, simulated_data)

Now let’s plot this data colored by Source with geom_point, geom_density2d(), and faceted by Source

ggplot(combined_data, aes(x = lon, y = lat, color = Source)) +
  geom_point() +
  geom_density2d() +
  facet_wrap(~ Source) +
  labs(title = "Spatial Distribution of Microbial Dispersal with Density Contours",
       x = "Longitude",
       y = "Latitude")

What do you observe?
  • We can see that observed data is not uniformly distributed, with density increasing around a focal point. Meanwhile the simulated data is more uniformly distributed across its range.

Abiotic filters

Load in the file “N06-P02-E29_MR1_clone_1_all_evol_vs._FNA17_-_Low_pH-_Sheet1.csv” as the df “exp_data” using read.csv()

exp_data <- read.csv("Lab_8_data/N06-P02-E29_MR1_clone_1_all_evol_vs._FNA17_-_Low_pH_-_Sheet1.csv")
meta_exp <- read.csv("Lab_8_data/metadata_N06-P02-E29_MR1_clone_1_all_evol_vs._FNA17_-_Low_pH.csv")

Inspect the dataset

summary(exp_data)
   experiment       nectar          date_count_inoc    date_count_final  
 Min.   :20.00   Length:3072        Length:3072        Length:3072       
 1st Qu.:21.75   Class :character   Class :character   Class :character  
 Median :25.50   Mode  :character   Mode  :character   Mode  :character  
 Mean   :24.75                                                           
 3rd Qu.:27.25                                                           
 Max.   :29.00                                                           
                                                                         
     column          row              location           microbe         
 Min.   : 1.00   Length:3072        Length:3072        Length:3072       
 1st Qu.: 3.75   Class :character   Class :character   Class :character  
 Median : 6.50   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 6.50                                                           
 3rd Qu.: 9.25                                                           
 Max.   :12.00                                                           
                                                                         
    strain          evol_treatment      evol_comp            clone          
 Length:3072        Length:3072        Length:3072        Length:3072       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
   evol_rep         evol_rep_comp         tech_rep      treatment        
 Length:3072        Length:3072        Min.   :1.000   Length:3072       
 Class :character   Class :character   1st Qu.:1.000   Class :character  
 Mode  :character   Mode  :character   Median :1.000   Mode  :character  
                                       Mean   :2.426                     
                                       3rd Qu.:4.000                     
                                       Max.   :9.000                     
                                                                         
 X96_well_plate  plate_type        inoc_density          inoc_CFU       
 Min.   :1.0    Length:3072        Length:3072        Min.   :     0.0  
 1st Qu.:1.0    Class :character   Class :character   1st Qu.:     0.0  
 Median :1.5    Mode  :character   Mode  :character   Median :     7.0  
 Mean   :1.5                                          Mean   :   360.9  
 3rd Qu.:2.0                                          3rd Qu.:    15.0  
 Max.   :2.0                                          Max.   :100000.0  
                                                      NA's   :1737      
 inoc_dilution      inoc_CFU_uL        inoc_cells    final_dilution  
 Min.   :      0   Min.   :      0   Min.   :0.000   Min.   :     0  
 1st Qu.:     10   1st Qu.:      0   1st Qu.:0.000   1st Qu.:   100  
 Median :    100   Median :     22   Median :1.362   Median :  1000  
 Mean   :  21785   Mean   :  52677   Mean   :1.918   Mean   : 19570  
 3rd Qu.:  10000   3rd Qu.:   6667   3rd Qu.:3.824   3rd Qu.: 10000  
 Max.   :1000000   Max.   :9666667   Max.   :6.985   Max.   :100000  
 NA's   :2085      NA's   :1576      NA's   :1576    NA's   :1341    
   final_CFU       final_CFU_uL      final_cells       notes          
 Min.   : 0.000   Min.   :      0   Min.   :0.000   Length:3072       
 1st Qu.: 0.000   1st Qu.:      0   1st Qu.:0.000   Class :character  
 Median : 3.000   Median :     16   Median :1.176   Mode  :character  
 Mean   : 6.844   Mean   :  22167   Mean   :1.853                     
 3rd Qu.:10.000   3rd Qu.:   8000   3rd Qu.:3.842                     
 Max.   :78.000   Max.   :1760000   Max.   :6.246                     
 NA's   :180      NA's   :123       NA's   :60                        
    inoc_pH         final_pH     final_dilution_P1  final_CFU_P1   
 Min.   :3.000   Min.   :2.000   Min.   :    1     Min.   : 0.000  
 1st Qu.:3.000   1st Qu.:2.000   1st Qu.:   10     1st Qu.: 3.000  
 Median :3.500   Median :2.500   Median :  100     Median : 5.000  
 Mean   :3.349   Mean   :2.568   Mean   : 4711     Mean   : 8.396  
 3rd Qu.:3.500   3rd Qu.:3.000   3rd Qu.:10000     3rd Qu.: 8.000  
 Max.   :3.500   Max.   :4.500   Max.   :10000     Max.   :57.000  
 NA's   :2880    NA's   :2688    NA's   :3027      NA's   :3024    
 final_CFU_uL_P1   final_cells_P1   final_dilution_P2  final_CFU_P2   
 Min.   :    0.0   Min.   :0.0000   Min.   :   10     Min.   : 0.000  
 1st Qu.:    7.7   1st Qu.:0.0000   1st Qu.:   10     1st Qu.: 2.750  
 Median :   80.0   Median :0.0000   Median : 5050     Median : 5.000  
 Mean   : 5457.2   Mean   :0.2809   Mean   : 5018     Mean   : 5.125  
 3rd Qu.:10000.0   3rd Qu.:0.0000   3rd Qu.:10000     3rd Qu.: 7.250  
 Max.   :28000.0   Max.   :4.4472   Max.   :10000     Max.   :14.000  
 NA's   :3024      NA's   :2688     NA's   :3030      NA's   :3024    
 final_CFU_uL_P2 final_cells_P2  
 Min.   :    0   Min.   :0.0000  
 1st Qu.:    6   1st Qu.:0.0000  
 Median :  100   Median :0.0000  
 Mean   : 5228   Mean   :0.2815  
 3rd Qu.:10000   3rd Qu.:0.0000  
 Max.   :28000   Max.   :4.4472  
 NA's   :3024    NA's   :2688    
unique(exp_data$microbe)
[1] "yeast"    "bacteria" "none"    
What do we need to clean from this dataset?
  • Remove na and “none” rows for microbes. Remove na rows for “inoc_pH” “final_pH.” Remove rows where treatment is blank

  • Replace nas for rows inoc_cells and final_cells with 0s

exp_data_clean <- exp_data %>%
  # Remove na and "none" rows for microbes. Remove na rows for "inoc_pH" and "final_pH." Remove rows where treatment is blank
  filter(!is.na(microbe) & !is.na(final_pH) & !is.na(inoc_pH) & microbe != "none" & treatment != "") %>%
  
  # Replace nas for rows inoc_cells, final_cells with 0s
  replace_na(list(inoc_cells = 0, final_cells = 0, inoc_pH = 0)) %>%
  
  # Select only the columns microbe, inoc_cells, final_cells, treatment, inoc_pH, and final_pH
  select(microbe, inoc_cells, final_cells, treatment, inoc_pH, final_pH, evol_treatment) %>%
  
  # Create column "change_in_cells" from "finals_cells - inoc_cells" and column "delta_pH" from "final_pH - inoc_pH"
  mutate(
    change_in_cells = final_cells - inoc_cells,
    delta_pH = final_pH - inoc_pH
  )

For “filter(microbe ==”bacteria”)“, plot final_pH by change_in_cells using geom_jitter() and geom_smooth() with method = "lm". Note throughout this lab that we are oversimplifying this analysis by using geom_smooth(), for a real study we should be sure we are exploring various models to most accurately fit our data.

ggplot(filter(exp_data_clean, microbe == "bacteria"), aes(x = final_pH, y = change_in_cells)) +
  geom_jitter(alpha = 0.1, width = 0.05) +
  geom_smooth(method = "lm") +
  labs(title = "Impact of Final pH on Microbial Growth",
       x = "Final pH",
       y = "Change in Bacterial Cell Count (CFU)")
`geom_smooth()` using formula = 'y ~ x'

Do the same for “filter(exp_data_clean, microbe ==”yeast”),” but log-scale the Y-axis and set geom_smooth with method = "lm" to fullrange = TRUE.

ggplot(filter(exp_data_clean, microbe == "yeast"), aes(x = final_pH, y = change_in_cells)) +
  geom_jitter(alpha = 0.1, width = 0.05) +
  geom_smooth(method = "lm", fullrange = TRUE) +
  scale_y_log10() +
  labs(title = "Impact of Final pH on Microbial Growth",
       x = "Final pH",
       y = "Change in Yeast Cell Count (CFU)")
`geom_smooth()` using formula = 'y ~ x'

What do you observe in this oversimplified analysis?
  • Bacteria responds negatively to higher pH, while yeast are responding positively. This suggests that pH may be acting as an abiotic filter on the microbial community. Overall, bacteria does better in a more acidic environment while yeast does better in a more alkaline environment.

Biotic Filters

Nectar Treatments

Let’s continue working with our exp_data_filtered dataframe by looking into the results of colonization treatments:

Let’s see if the microbes are themselves affecting the pH. If they are affecting the pH, what could this imply?
  • That through modifying pH, one microbe could be controlling the other microbe through competition as a biotic filter.

Create a box-plot to explore the relationship between treatment and delta_pH:

ggplot(exp_data_clean, aes(x = treatment, y = delta_pH, color = treatment)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  labs(title = "Change in pH by Treatment",
       x = "Treatment",
       y = "Change in pH")

What do you observe?
  • The control treatment has a high pH (more basic)
  • When bacteria or yeast are added later, they have the same average effect on the pH. We see differences when either bacteria or yeast added first.
  • When bacteria added first regardless if yeast added or not, pH lowers (becomes more acidic)
  • Meanwhile when yeasts added first regardless if bacteria added or not, pH increases (becomes more basic). There is also a much larger variability in pH
  • It appears that bacteria is changing the pH of the nectar in its favor, limiting the ability of yeast to thrive

Now let’s explore this by plotting the number of yeast cells by treatment.

First I will create a new dataframe

exp_data_clean_yeast <- exp_data_clean %>%
  filter(microbe == "yeast" & 
        (treatment == "Y-" | treatment == "BY" | treatment == "YB" | treatment == "-Y"))


head(exp_data_clean_yeast)
  microbe inoc_cells final_cells treatment inoc_pH final_pH evol_treatment
1   yeast          0    3.556423        YB     3.0      2.5         low_pH
2   yeast          0    3.662852        YB     3.5      2.5         low_pH
3   yeast          0    3.869290        YB     3.5      2.5         low_pH
4   yeast          0    3.732474        YB     3.0      2.5         low_pH
5   yeast          0    3.662852        YB     3.0      2.5      ancestral
6   yeast          0    3.602169        YB     3.0      2.5      ancestral
  change_in_cells delta_pH
1        3.556423     -0.5
2        3.662852     -1.0
3        3.869290     -1.0
4        3.732474     -0.5
5        3.662852     -0.5
6        3.602169     -0.5

Now create a boxplot comparing treatment and final yeast cell counts:

ggplot(exp_data_clean_yeast, aes(x = treatment, y = final_cells)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  labs(title = "Yeast Cells by Treatment",
       x = "Treatments",
       y = "Number of Yeast Cells")

ggplot(exp_data_clean_yeast, aes(x = treatment, y = final_cells)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  labs(title = "Yeast Cells by Treatment",
       x = "Treatments",
       y = "Number of Yeast Cells")

This is a great example of priority effects, where the order of colonization has an effect on community assembly. It also showcases two separate stable states of pH as an outcome based on these priority effects.

The study expands on these results to relate these results to community dynamics as a whole:

They made these conclusions by creating artificial flowers and inoculating them: YouTube Video

Combining Filters

Recall that looking at one filter at a time is an oversimplification, in the real-world these “filters” can occur repeatedly and even simultaneously.

Colonization-Competition Tradeoffs

Let’s look at the combined effects of dispersal and biotic filters through this simulation inspired by Vellend and MacDonald 2016(vellend2016simulating?) from the book The Theory of Ecological Communities.

We will explore colonization-competition tradeoff model across two patches. The model captures the !essence of the interaction between dispersal capabilities and competitive strength. It is a stochastic simulation where colonization events are interspersed with competitive encounters, and the outcome is contingent on the inherent traits of the species involved and their initial frequencies.

We will simulate this model over 100 years, tracking the frequency of each species within two patches. The species are represented numerically: ‘1’ for yeast and ‘2’ for bacteria. The initial condition is a 50/50 split between the two species in each patch. The model includes the following parameters:

  • J: The total number of individuals in each patch.
  • m: The rate of dispersal between patches.
  • fit.ratio.avg: The average fitness ratio of the species when there is no dispersal, indicating competitive strength.
  • fit.ratio.m: The fitness ratio during dispersal events, indicating colonization ability.
  • freq.dep: The frequency dependence in competitive interactions
# Set parameters and initial conditions
num.years <- 100
num.patch <- 2
J <- 1000
m <- 0.05
fit.ratio.avg <- c(1.2, 1.2)
fit.ratio.m <- 1/5
freq.dep <- rep(0, num.patch)
init.1 <- 0.5 * J
COM <- matrix(ifelse(runif(J * num.patch) < 0.5, 1, 2), ncol = num.patch)

# Create a data frame to store the frequency of species 1 per year in each patch
freq.1.mat <- as.data.frame(matrix(ncol = num.patch, nrow = num.years))
names(freq.1.mat) <- paste0("Patch", 1:num.patch)

# Populate the initial frequencies
for (patch in 1:num.patch) {
  freq.1.mat[1, patch] <- sum(COM[, patch] == 1) / J
}

# Run the simulation
for (year in 2:num.years) {
  for (i in 1:(J * num.patch)) {
    # Choose a patch where a death event will occur
    patch <- sample(1:num.patch, 1)
    
    # Calculate Pr.1 if dispersal occurs
    freq.1.meta <- sum(COM == 1) / (J * num.patch)
    Pr.1 <- ifelse(runif(1) < m, 
                   fit.ratio.m * freq.1.meta / (fit.ratio.m * freq.1.meta + (1 - freq.1.meta)),
                   {
                     freq.1 <- sum(COM[, patch] == 1) / J
                     freq.2 <- 1 - freq.1
                     fit.ratio <- exp(freq.dep[patch] * (freq.1 - 0.5) + log(fit.ratio.avg[patch]))
                     fit.ratio * freq.1 / (fit.ratio * freq.1 + freq.2)
                   })
    
    # Implement a birth-death event
    COM[sample(J, 1), patch] <- sample(c(1, 2), 1, prob = c(Pr.1, 1 - Pr.1))
  }
  
  # Record the frequency of species 1 for the current year
  for (patch in 1:num.patch) {
    freq.1.mat[year, patch] <- sum(COM[, patch] == 1) / J
  }
}

Plot the results of the simulation

# Convert to long format for ggplot
freq.1.long <- freq.1.mat %>%
  mutate(Year = 1:num.years) %>%
  gather(key = "Patch", value = "Frequency", -Year)

ggplot(freq.1.long, aes(x = Year, y = Frequency, group = Patch, color = Patch)) +
  geom_line() +
  labs(x = "Time", y = "Frequency of species 1", title = "Colonization-Competition Tradeoff Model") 

Now vary the parameters, what do you observe?
  • Colonization Frequency:

    • Higher dispersal rates (larger values of m) lead to more frequent colonization events between patches.

    • Lower dispersal rates (smaller values of m) result in less frequent colonization events.

  • Competitive Dynamics:

    • Higher average fitness ratios for species 1 (larger values in fit.ratio.avg) result in a competitive advantage for species 1, leading to its dominance over time.

    • Lower average fitness ratios for species 1 (smaller values in fit.ratio.avg) favor species 2, causing it to dominate in the long term.

  • Colonization Ability:

    • Higher fitness ratios during dispersal events (larger values in fit.ratio.m) give species 1 a stronger colonization ability, leading to its increased frequency over time.

    • Lower fitness ratios during dispersal events (smaller values in fit.ratio.m) weaken species 1’s colonization ability, allowing species 2 to maintain dominance.

  • Frequency Dependence:

    • Positive frequency dependence (positive freq.dep) enhances the competitive advantage of species 1 as its frequency increases, leading to its dominance over time.

    • Negative frequency dependence (negative freq.dep) diminishes the competitive advantage of species 1 as its frequency increases, resulting in a more balanced coexistence between species.

  • Overall Dynamics:

    • The interaction between dispersal capabilities, competitive strength, and frequency dependence shapes the long-term dynamics of species coexistence and dominance.

    • Different parameter combinations lead to diverse ecological outcomes, including species coexistence, dominance shifts, or extinction events.

Incorporating selection into the Colonization-Competition Tradeoff model

In addressing unexpected behavior observed during high dispersal scenarios, Melissa Chen proposed alterations to the code to incorporate selection at the metacommunity scale. She presented two approaches:

  1. Reproduction Before Dispersal:

    • In this approach, individuals reproduce within their “home” patch before dispersing offspring into other patches.

    • Fitness differences are weighted by local species frequencies, followed by dispersal abilities, to calculate the probability of species recruitment.

  2. Dispersal Before Reproduction:

    • Alternatively, adults disperse to new patches and then reproduce within the new patch.

    • Here, metacommunity species frequencies are weighted by dispersal ability differences first, followed by fitness differences in the recruiting patch, to calculate recruitment probabilities.

You can check out this approach online, but serves as another example of the complexity you need to introduce into community assembly models in order for them to become generalizable across real-world systems.

Future directions in the field of Community Assembly

Functional traits

Similar to local species abundance/richness versus species pools, there seems to be filtering effects that limit the functional traits of species (trait space) in a local community. In part this was initially proposed by separating species into feeding guilds(fox1993assembly?), but since that time additional intricacies have been introduced by separating species by multi-dimensional functional traits.

An example of feeding guilds. Source: Brian O’Neill; YouTube

Here is an example of visualizing traits. If abiotic filters were the only factor limiting local functional traits, you would expect to see traits focused around an optimum based on abiotic conditions (left), instead we often see traits “spread out” across a larger convex hull (right) similar to what we see for local species richness vs. species pools due to biotic filters:

Source: Mittelbach & McGill 2019(Mittelbach and McGill 2019)

In this real world example, functional diversity of British birds (red) is much lower than predicted (blue) indicating filtering is occurring beyond abiotic filtering. Source: Petchey et. al 2007(petchey2007low?)

Current research has revealed that conducting the types of community assembly analyses covered in this lab but focusing on functional traits instead of species identities are often more accurate for predicting community structure and are more generalizable for making conservation suggestions. Because of this, there has been a recent explosion of interest in compiling species trait data.

Note the consistent relationship among the functional trait leaf mass per unit area (LMA) with both mean annual temperature (MAT) and mean annual rainfall (MAR), despite very different species compositions (red dots) worldwide. Source: Wright et. al 2004(wright2004worldwide?)

Community Phylogenetics

All of the statements above about functional traits can be applied to phylogenetic similarity. However, although phylogenetic distance is more readily available than trait data, it is not a perfect proxy without fine-tuning. This is because the simplifications that all traits are conserved and that traits are not labile are often both incorrect.


References

Mittelbach, Gary G, and Brian J McGill. 2019. Community Ecology. Oxford University Press.
Stevens, Martin Henry Hoffman. 2009. A Primer of Ecology with r. Springer.