Introduction to Community Assembly

“Only a fraction of the forms that could theoretically do so actually form a community at any one time.” - Charles Elton (1950)

Why aren’t all species everywhere? Why are local scale communities smaller (less richness) than regional species pools? Why are there so many species? One of the most central challenges in ecology is pinpointing what determines the number, kinds, and abundances of species found in a community.

The study of community assembly attempts to describe the intricate processes that determine the composition and abundance of species within ecological communities.

Filters

In the traditional view of community assembly, there are three broad categories of processes/filters that determine colonization:

Dispersal
Abiotic/Environmental Filters (e.g. temperature, humidity, precipitation, geological features, nutrient availability, etc.)
Biotic/Interaction Filters (e.g. competition, predation, parasitism, disease, and mutualism)

As you can readily see, community assembly pulls together many of the topics we have covered this semester. As some additional examples, island biogeography theory and neutral theory emphasize a dispersal filter. Meanwhile co-existence theory involves a biotic filter.

These processes are traditionally viewed as sequential filters:

However, it is more accurate to remember that these three processes can act out of order and simultaneously. Imagine the impact of differing R* values in a system with nutrients that fluctuate seasonally or even geographically. Therefore filter may also not be the most accurate term even though it is the most popular way to refer to this concept.

Community Assembly Lab

The dataset for the upcoming lab originates from “pH as an Eco-evolutionary Driver of Priority Effects” by Chappell et al. (2022), which details field surveys on Diplacus aurantiacus nectar across the San Francisco Peninsula and subsequent lab experiments which investigated the role of pH in microbial community dynamics. The study showcased that the microbial dominance within nectar shifts between bacteria (specifically Acinetobacter nectaris) and yeasts (specifically Metschnikowia reukaufii).

Bush Monkeyflower (*Mimulus aurantiacus*). Source: Wikimedia

Load the dataset 2015_survey_samples_edit_cfu_master.csv as the df observed_data. This dataset consists of the field collected data.

observed_data <- read.csv('Lab_8_data/2015_survey_samples_edit_cfu_master.csv')

Explore the dataset here and by looking at the metadata.

head(observed_data)

  ï..number Plant_ID Flower_ID Date_collected Date_plated Replicate Nectar_cm
1         1      LH1         1        6/23/15     6/24/15         1       0.1
2         2      LH1         2        6/23/15     6/24/15         2       1.4
3         3      LH1         3        6/23/15     6/24/15         3       3.1
4         4      LH1         4        6/23/15     6/24/15         4       2.3
5         5      LH1         5        6/23/15     6/24/15         5       0.3
6         6     LH1          6        6/23/15     6/24/15         6       0.9
  Nectar_mm Nectar_vol Stigma_closed Fcfu1 Fcfu2 Fcfu3 Fcfu4 Fcfu5 Fcfu6
1         1  0.1210654           Yes     0     0     0     0     0     0
2        14  1.6949153           Yes     0     0     0     0     0     0
3        31  3.7530266           Yes    57     0     0     0     0     0
4        23  2.7845036           Yes     1     0     0     0     0     0
5         3  0.3631961           Yes     0     0     0     0     0     0
6         9  1.0895884           Yes     0     0     0     0     0     0
  F_nontarget    Fcfu_avg FCFU_total     FCFU_ul log.FCFU.1. Bcfu1 Bcfu2 Bcfu3
1        <NA>           0          0           0           0     0     0     0
2        <NA>           0          0           0           0    38     1     0
3        <NA>         9.5         57 171.0607382 2.235681782    11     1     0
4        <NA> 0.166666667          1 3.836009923 0.684487183   124     0     0
5        <NA>           0          0           0           0     1     0     0
6        <NA>           0          0           0           0     0     0     0
  Bcfu4 Bcfu5 Bcfu6    Bcfu_avg BCFU_total      BCFUul log.BCFU.1.  X X.1 X.2
1     0     0     0           0          0           0           0 NA  NA  NA
2     0     0     0         6.5         39 6915.391539 3.839879571 NA  NA  NA
3     0     0     0           2         12 1079.411167 3.033589064 NA  NA  NA
4     0     0     0 20.66666667        124 14257.11267 4.154062042 NA  NA  NA
5     0     0     0 0.166666667          1 761.4761476 2.882226262 NA  NA  NA
6     0     0     0           0          0           0           0 NA  NA  NA
  X.3 X.4 X.5 X.6 X.7 X.8 X.9
1  NA  NA  NA  NA  NA  NA  NA
2  NA  NA  NA  NA  NA  NA  NA
3  NA  NA  NA  NA  NA  NA  NA
4  NA  NA  NA  NA  NA  NA  NA
5  NA  NA  NA  NA  NA  NA  NA
6  NA  NA  NA  NA  NA  NA  NA

length(unique(observed_data$Plant_ID))

[1] 148

number_of_unique_combos <- observed_data %>%
  distinct(Plant_ID, Flower_ID) %>%
  nrow()

number_of_unique_combos

[1] 1154

summary(observed_data)

   ï..number        Plant_ID           Flower_ID      Date_collected    
 Min.   :   1.0   Length:1157        Min.   :   1.0   Length:1157       
 1st Qu.: 288.0   Class :character   1st Qu.: 289.0   Class :character  
 Median : 576.0   Mode  :character   Median : 577.0   Mode  :character  
 Mean   : 576.1                      Mean   : 576.6                     
 3rd Qu.: 864.0                      3rd Qu.: 864.0                     
 Max.   :1152.0                      Max.   :1152.0                     
 NA's   :4                           NA's   :4                          
 Date_plated          Replicate       Nectar_cm        Nectar_mm     
 Length:1157        Min.   :1.000   Min.   :0.0100   Min.   : 0.000  
 Class :character   1st Qu.:2.000   1st Qu.:0.3000   1st Qu.: 3.000  
 Mode  :character   Median :4.000   Median :0.7000   Median : 7.000  
                    Mean   :4.497   Mean   :0.9831   Mean   : 9.823  
                    3rd Qu.:6.000   3rd Qu.:1.4000   3rd Qu.:14.000  
                    Max.   :8.000   Max.   :5.6000   Max.   :56.000  
                    NA's   :4       NA's   :5        NA's   :4       
   Nectar_vol     Stigma_closed          Fcfu1             Fcfu2        
 Min.   :0.0000   Length:1157        Min.   :  0.000   Min.   :  0.000  
 1st Qu.:0.3632   Class :character   1st Qu.:  0.000   1st Qu.:  0.000  
 Median :0.8475   Mode  :character   Median :  0.000   Median :  0.000  
 Mean   :1.1892                      Mean   :  8.441   Mean   :  9.083  
 3rd Qu.:1.6949                      3rd Qu.:  3.000   3rd Qu.:  0.750  
 Max.   :6.7797                      Max.   :576.000   Max.   :752.000  
 NA's   :4                           NA's   :4         NA's   :879      
     Fcfu3             Fcfu4             Fcfu5         Fcfu6    
 Min.   : 0.0000   Min.   : 0.0000   Min.   :0     Min.   :0    
 1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.:0     1st Qu.:0    
 Median : 0.0000   Median : 0.0000   Median :0     Median :0    
 Mean   : 0.4836   Mean   : 0.0711   Mean   :0     Mean   :0    
 3rd Qu.: 0.0000   3rd Qu.: 0.0000   3rd Qu.:0     3rd Qu.:0    
 Max.   :43.0000   Max.   :13.0000   Max.   :0     Max.   :0    
 NA's   :944       NA's   :960       NA's   :965   NA's   :965  
 F_nontarget          Fcfu_avg          FCFU_total          FCFU_ul         
 Length:1157        Length:1157        Length:1157        Length:1157       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
 log.FCFU.1.           Bcfu1               Bcfu2              Bcfu3       
 Length:1157        Length:1157        Min.   :    0.00   Min.   : 0.000  
 Class :character   Class :character   1st Qu.:    1.00   1st Qu.: 0.000  
 Mode  :character   Mode  :character   Median :    2.00   Median : 0.000  
                                       Mean   :   31.68   Mean   : 1.781  
                                       3rd Qu.:    5.00   3rd Qu.: 1.000  
                                       Max.   :10000.00   Max.   :48.000  
                                       NA's   :728        NA's   :1002    
     Bcfu4            Bcfu5          Bcfu6        Bcfu_avg        
 Min.   :0.0000   Min.   :0      Min.   :0      Length:1157       
 1st Qu.:0.0000   1st Qu.:0      1st Qu.:0      Class :character  
 Median :0.0000   Median :0      Median :0      Mode  :character  
 Mean   :0.2991   Mean   :0      Mean   :0                        
 3rd Qu.:0.0000   3rd Qu.:0      3rd Qu.:0                        
 Max.   :6.0000   Max.   :0      Max.   :0                        
 NA's   :1050     NA's   :1058   NA's   :1057                     
   BCFU_total         BCFUul          log.BCFU.1.           X          
 Min.   :    0.0   Length:1157        Length:1157        Mode:logical  
 1st Qu.:    0.0   Class :character   Class :character   NA's:1157     
 Median :    3.0   Mode  :character   Mode  :character                 
 Mean   :  375.7                                                       
 3rd Qu.:   18.0                                                       
 Max.   :20000.0                                                       
 NA's   :4                                                             
   X.1            X.2            X.3            X.4            X.5         
 Mode:logical   Mode:logical   Mode:logical   Mode:logical   Mode:logical  
 NA's:1157      NA's:1157      NA's:1157      NA's:1157      NA's:1157     
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
   X.6            X.7            X.8            X.9         
 Mode:logical   Mode:logical   Mode:logical   Mode:logical  
 NA's:1157      NA's:1157      NA's:1157      NA's:1157

Describe the dataset:

The dataset includes unique plant and flower IDs across sites.
148 plants and 1,154 flowers were surveyed.
Each row represents a unique flower. Along with other microbe measurement data, it includes the number of different species of bacteria (Bcfus, or bacterial colony forming units) and fungi (Fcfu) in each plant/flower. Also the amount of nectar is detailed.
Oddly, the site numbers or locations are not included.
Also oddly, the units for vol are not given in the metadata

What needs to be cleaned from the dataset?

There are NAs present in the Bcfu and Fcfu columns, which we can convert to 0s for our purposes
Some of the columns (like Fcfu_avg, FCFU_total, Bcfu_avg) are character columns and should be numeric
There are a bunch of empty x columns we should remove, probably from the formatting of the original CSV files
The amount of fungi/bacteria present should depend on the volume of nectar present, so let’s create new columns for fungal/bacterial concentration
Also, for the purpose of this exercise let’s just focus on total cfus, so let’s remove the other columns for the more specific morphologies

# Clean our dataset
observed_data_clean <- observed_data %>%
  select(-Date_collected, -Date_plated, -Replicate, -Nectar_cm, -Nectar_mm,
         -F_nontarget, -FCFU_ul, -`log.FCFU.1.`, -BCFUul, -`log.BCFU.1.`,
         -starts_with("X"), -matches("^(Bcfu|Fcfu)\\d+$")) %>%
  mutate(
    FCFU_total = as.numeric(FCFU_total),
    BCFU_total = as.numeric(BCFU_total),
    Fcfu_avg = as.numeric(Fcfu_avg),
    Bcfu_avg = as.numeric(Bcfu_avg),
    Bcfu_conc = BCFU_total / Nectar_vol,
    Fcfu_conc = FCFU_total / Nectar_vol,
    Source = "Observed" # we will use this column later
  ) %>%
  replace_na(list(FCFU_total = 0, BCFU_total = 0, Fcfu_avg = 0, Bcfu_avg = 0, Nectar_vol = 0)) %>%
  replace_na(list(Fcfu_conc = 0, Bcfu_conc = 0))

summary(observed_data_clean)

   ï..number        Plant_ID           Flower_ID        Nectar_vol    
 Min.   :   1.0   Length:1157        Min.   :   1.0   Min.   :0.0000  
 1st Qu.: 288.0   Class :character   1st Qu.: 289.0   1st Qu.:0.3632  
 Median : 576.0   Mode  :character   Median : 577.0   Median :0.8475  
 Mean   : 576.1                      Mean   : 576.6   Mean   :1.1851  
 3rd Qu.: 864.0                      3rd Qu.: 864.0   3rd Qu.:1.6949  
 Max.   :1152.0                      Max.   :1152.0   Max.   :6.7797  
 NA's   :4                           NA's   :4                        
 Stigma_closed         Fcfu_avg         FCFU_total          Bcfu_avg      
 Length:1157        Min.   :  0.000   Min.   :   0.000   Min.   :    0.0  
 Class :character   1st Qu.:  0.000   1st Qu.:   0.000   1st Qu.:    0.0  
 Mode  :character   Median :  0.000   Median :   0.000   Median :    2.0  
                    Mean   :  6.705   Mean   :   8.738   Mean   :  284.7  
                    3rd Qu.:  2.333   3rd Qu.:   1.816   3rd Qu.:   10.5  
                    Max.   :576.000   Max.   :1394.673   Max.   :10000.0  
                                                                          
   BCFU_total        Bcfu_conc          Fcfu_conc          Source         
 Min.   :    0.0   Min.   :    0.00   Min.   :  0.000   Length:1157       
 1st Qu.:    0.0   1st Qu.:    0.00   1st Qu.:  0.000   Class :character  
 Median :    3.0   Median :    4.13   Median :  0.000   Mode  :character  
 Mean   :  374.4   Mean   :  672.47   Mean   :  6.665                     
 3rd Qu.:   18.0   3rd Qu.:   20.65   3rd Qu.:  3.000                     
 Max.   :20000.0   Max.   :82600.00   Max.   :576.000

Dispersal

Now that we have loaded and cleaned our dataset, let’s simulate dispersal if there was no filtering. We will simulate the dispersal being uniformly (not randomly) distributed because for dispersal that would be closest to a null hypothesis in most situations.

Let’s simulate our data as if it was uniformly distributed across flowers

simulated_data <- observed_data_clean[, c("Plant_ID", "Flower_ID", "Stigma_closed", "Nectar_vol")]

# Based on the max value from our data summary above
cfu_count_range <- 0:10000

# Simulate uniform CFU counts for fungal and bacterial samples
for(i in 1:6){
  simulated_data[[paste0('Fcfu', i)]] <- sample(cfu_count_range, nrow(simulated_data), replace = TRUE)
  simulated_data[[paste0('Bcfu', i)]] <- sample(cfu_count_range, nrow(simulated_data), replace = TRUE)
}

# Calculate other columns from our orignal dataset
simulated_data$Fcfu_total <- rowSums(simulated_data[, grepl('Fcfu', names(simulated_data))])
simulated_data$Bcfu_total <- rowSums(simulated_data[, grepl('Bcfu', names(simulated_data))])
simulated_data$Fcfu_avg <- simulated_data$Fcfu_total / 6
simulated_data$Bcfu_avg <- simulated_data$Bcfu_total / 6
simulated_data$Fcfu_conc <- simulated_data$Fcfu_total / as.numeric(simulated_data$Nectar_vol)
simulated_data$Bcfu_conc <- simulated_data$Bcfu_total / as.numeric(simulated_data$Nectar_vol)

simulated_data <- simulated_data %>%
  select(-matches("^(Bcfu|Fcfu)\\d+$")) %>%
  mutate(Source = "Simulated")

head(simulated_data)

  Plant_ID Flower_ID Stigma_closed Nectar_vol Fcfu_total Bcfu_total Fcfu_avg
1      LH1         1           Yes  0.1210654      26709      42133 4451.500
2      LH1         2           Yes  1.6949153      34325      32985 5720.833
3      LH1         3           Yes  3.7530266      30841      16029 5140.167
4      LH1         4           Yes  2.7845036      23994      31297 3999.000
5      LH1         5           Yes  0.3631961      27456      27035 4576.000
6     LH1          6           Yes  1.0895884      29645      28034 4940.833
  Bcfu_avg  Fcfu_conc  Bcfu_conc    Source
1 7022.167 220616.341 348018.581 Simulated
2 5497.500  20251.750  19461.150 Simulated
3 2671.500   8217.634   4270.953 Simulated
4 5216.167   8616.976  11239.705 Simulated
5 4505.833  75595.520  74436.367 Simulated
6 4672.333  27207.522  25728.982 Simulated

Let’s plot our observed versus simulated data:

combined_data <- bind_rows(observed_data_clean, simulated_data)

ggplot(combined_data, aes(x = Source, y = Fcfu_conc, fill = Source)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  scale_y_log10() +
  labs(title = "Fungal CFU Concentration (per vol of nectar)",
       y = "Concentration",
       x = "")

Remember that it is crucial to always show your actual data, that is why I included points in our plot. Note that I jittered the points for increased visibility for many points versus geom_point(). I also set geom_boxplot() to outlier.shape = NA so that geom_boxplot() would not also plot the points.

Now repeat for the bacterial data:

ggplot(combined_data, aes(x = Source, y = Bcfu_conc, fill = Source)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  scale_y_log10() +
  labs(title = "Bacterial CFU Concentration (per vol of nectar)",
       y = "Concentration",
       x = "")

What do you observe?

It appears the actual dispersal of microbes is dramatically different than if it was just uniformly distributed. This suggests dispersal could be a filter affecting the assembly of these microbe communities.

If we had geospatial data for our dataset, we could do a similar simple analysis. Here I am simulating an artificial “observed” population that is not uniformly distributed along with a simulated distribution that is uniformly distributed.

# Simulating "observed" data with specific lat-lon distribution patterns
observed_data <- data.frame(
  Plant_ID = 1:100,
  lat = rnorm(100, mean = 34, sd = 0.1), # Simulating around a central latitude
  lon = rnorm(100, mean = -118, sd = 0.1), # Simulating around a central longitude
  Source = 'Observed'
)

# Simulating "simulated" data with a uniform distribution across the given range
simulated_data <- data.frame(
  Plant_ID = 1:100,
  lat = runif(100, min = 33.8, max = 34.2), # Uniform distribution within a latitudinal range
  lon = runif(100, min = -118.2, max = -117.8), # Uniform distribution within a longitudinal range
  Source = 'Simulated'
)

combined_data <- rbind(observed_data, simulated_data)

Now let’s plot this data colored by Source with geom_point, geom_density2d(), and faceted by Source

ggplot(combined_data, aes(x = lon, y = lat, color = Source)) +
  geom_point() +
  geom_density2d() +
  facet_wrap(~ Source) +
  labs(title = "Spatial Distribution of Microbial Dispersal with Density Contours",
       x = "Longitude",
       y = "Latitude")

What do you observe?

We can see that observed data is not uniformly distributed, with density increasing around a focal point. Meanwhile the simulated data is more uniformly distributed across its range

Abiotic filters

Load in the file “N06-P02-E27_MR1_clone_1_all_evol_vs._FNA17_-_Sheet1.csv” as the df “exp_data”

exp_data <- read.csv("Lab_8_data/N06-P02-E29_MR1_clone_1_all_evol_vs._FNA17_-_Low_pH_-_Sheet1.csv")

What do we need to clean from this dataset?

summary(exp_data)

   experiment       nectar          date_count_inoc    date_count_final  
 Min.   :20.00   Length:3072        Length:3072        Length:3072       
 1st Qu.:21.75   Class :character   Class :character   Class :character  
 Median :25.50   Mode  :character   Mode  :character   Mode  :character  
 Mean   :24.75                                                           
 3rd Qu.:27.25                                                           
 Max.   :29.00                                                           
                                                                         
     column          row              location           microbe         
 Min.   : 1.00   Length:3072        Length:3072        Length:3072       
 1st Qu.: 3.75   Class :character   Class :character   Class :character  
 Median : 6.50   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 6.50                                                           
 3rd Qu.: 9.25                                                           
 Max.   :12.00                                                           
                                                                         
    strain          evol_treatment      evol_comp            clone          
 Length:3072        Length:3072        Length:3072        Length:3072       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
   evol_rep         evol_rep_comp         tech_rep      treatment        
 Length:3072        Length:3072        Min.   :1.000   Length:3072       
 Class :character   Class :character   1st Qu.:1.000   Class :character  
 Mode  :character   Mode  :character   Median :1.000   Mode  :character  
                                       Mean   :2.426                     
                                       3rd Qu.:4.000                     
                                       Max.   :9.000                     
                                                                         
 X96_well_plate  plate_type        inoc_density          inoc_CFU       
 Min.   :1.0    Length:3072        Length:3072        Min.   :     0.0  
 1st Qu.:1.0    Class :character   Class :character   1st Qu.:     0.0  
 Median :1.5    Mode  :character   Mode  :character   Median :     7.0  
 Mean   :1.5                                          Mean   :   360.9  
 3rd Qu.:2.0                                          3rd Qu.:    15.0  
 Max.   :2.0                                          Max.   :100000.0  
                                                      NA's   :1737      
 inoc_dilution      inoc_CFU_uL        inoc_cells    final_dilution  
 Min.   :      0   Min.   :      0   Min.   :0.000   Min.   :     0  
 1st Qu.:     10   1st Qu.:      0   1st Qu.:0.000   1st Qu.:   100  
 Median :    100   Median :     22   Median :1.362   Median :  1000  
 Mean   :  21785   Mean   :  52677   Mean   :1.918   Mean   : 19570  
 3rd Qu.:  10000   3rd Qu.:   6667   3rd Qu.:3.824   3rd Qu.: 10000  
 Max.   :1000000   Max.   :9666667   Max.   :6.985   Max.   :100000  
 NA's   :2085      NA's   :1576      NA's   :1576    NA's   :1341    
   final_CFU       final_CFU_uL      final_cells       notes          
 Min.   : 0.000   Min.   :      0   Min.   :0.000   Length:3072       
 1st Qu.: 0.000   1st Qu.:      0   1st Qu.:0.000   Class :character  
 Median : 3.000   Median :     16   Median :1.176   Mode  :character  
 Mean   : 6.844   Mean   :  22167   Mean   :1.853                     
 3rd Qu.:10.000   3rd Qu.:   8000   3rd Qu.:3.842                     
 Max.   :78.000   Max.   :1760000   Max.   :6.246                     
 NA's   :180      NA's   :123       NA's   :60                        
    inoc_pH         final_pH     final_dilution_P1  final_CFU_P1   
 Min.   :3.000   Min.   :2.000   Min.   :    1     Min.   : 0.000  
 1st Qu.:3.000   1st Qu.:2.000   1st Qu.:   10     1st Qu.: 3.000  
 Median :3.500   Median :2.500   Median :  100     Median : 5.000  
 Mean   :3.349   Mean   :2.568   Mean   : 4711     Mean   : 8.396  
 3rd Qu.:3.500   3rd Qu.:3.000   3rd Qu.:10000     3rd Qu.: 8.000  
 Max.   :3.500   Max.   :4.500   Max.   :10000     Max.   :57.000  
 NA's   :2880    NA's   :2688    NA's   :3027      NA's   :3024    
 final_CFU_uL_P1   final_cells_P1   final_dilution_P2  final_CFU_P2   
 Min.   :    0.0   Min.   :0.0000   Min.   :   10     Min.   : 0.000  
 1st Qu.:    7.7   1st Qu.:0.0000   1st Qu.:   10     1st Qu.: 2.750  
 Median :   80.0   Median :0.0000   Median : 5050     Median : 5.000  
 Mean   : 5457.2   Mean   :0.2809   Mean   : 5018     Mean   : 5.125  
 3rd Qu.:10000.0   3rd Qu.:0.0000   3rd Qu.:10000     3rd Qu.: 7.250  
 Max.   :28000.0   Max.   :4.4472   Max.   :10000     Max.   :14.000  
 NA's   :3024      NA's   :2688     NA's   :3030      NA's   :3024    
 final_CFU_uL_P2 final_cells_P2  
 Min.   :    0   Min.   :0.0000  
 1st Qu.:    6   1st Qu.:0.0000  
 Median :  100   Median :0.0000  
 Mean   : 5228   Mean   :0.2815  
 3rd Qu.:10000   3rd Qu.:0.0000  
 Max.   :28000   Max.   :4.4472  
 NA's   :3024    NA's   :2688

unique(exp_data$microbe)

[1] "yeast"    "bacteria" "none"

Remove na and “none” rows for microbes. Remove na rows for “inoc_pH” “final_pH.” Remove rows where treatment is blank
Replace nas for rows inoc_cells and final_cells with 0s

exp_data_clean <- exp_data %>%
  # Remove na and "none" rows for microbes. Remove na rows for "inoc_pH" and "final_pH." Remove rows where treatment is blank
  filter(!is.na(microbe) & !is.na(final_pH) & !is.na(inoc_pH) & microbe != "none" & treatment != "") %>%
  
  # Replace nas for rows inoc_cells, final_cells with 0s
  replace_na(list(inoc_cells = 0, final_cells = 0, inoc_pH = 0)) %>%
  
  # Select only the columns microbe, inoc_cells, final_cells, treatment, inoc_pH, and final_pH
  select(microbe, inoc_cells, final_cells, treatment, inoc_pH, final_pH) %>%
  
  # Create column "change_in_cells" from "finals_cells - inoc_cells" and column "delta_pH" from "final_pH - inoc_pH"
  mutate(
    change_in_cells = final_cells - inoc_cells,
    delta_pH = final_pH - inoc_pH
  )

For “filter(exp_data_clean, microbe ==”bacteria”)“, plot final_pH by change_in_cells using geom_jitter() and geom_smooth() with method = "lm". Note throughout this lab that we are oversimplifying this analysis by using geom_smooth(), for a real study we should be sure we are exploring various models to most accurately fit our data.

ggplot(filter(exp_data_clean, microbe == "bacteria"), aes(x = final_pH, y = change_in_cells)) +
  geom_jitter(alpha = 0.1, width = 0.05) +
  geom_smooth(method = "lm") +
  labs(title = "Impact of Final pH on Microbial Growth",
       x = "Final pH",
       y = "Change in Bacterial Cell Count (CFU)")

`geom_smooth()` using formula = 'y ~ x'

Do the same for “filter(exp_data_clean, microbe ==”yeast”),” but log-scale the Y-axis and set geom_smooth with method = "lm" to fullrange = TRUE.

ggplot(filter(exp_data_clean, microbe == "yeast"), aes(x = final_pH, y = change_in_cells)) +
  geom_jitter(alpha = 0.1, width = 0.05) +
  geom_smooth(method = "lm", fullrange = TRUE) +
  scale_y_log10() +
  labs(title = "Impact of Final pH on Microbial Growth",
       x = "Final pH",
       y = "Change in Yeast Cell Count (CFU)")

`geom_smooth()` using formula = 'y ~ x'

What do you observe in this simple analysis?

Bacteria responds negatively to higher pH, while yeast are responding positively. This suggests that pH may be acting as an abiotic filter on the microbial community. Overall, bacteria does better in a more acidic environment while yeast does better in a more alkaline environment.

Biotic Filters

Nectar Treatments

Let’s continue working with our exp_data_filtered dataframe by looking into the results of colonization treatments:

Let’s see if the microbes are themselves affecting the pH. If they are affecting the pH, what could this imply?

That through modifying pH, one microbe could be controlling the other microbe through competition as a biotic filter.

Create a box-plot to explore the relationship between treatment and delta_pH:

ggplot(exp_data_clean, aes(x = treatment, y = delta_pH, fill = treatment)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  labs(title = "Change in pH by Treatment",
       x = "Treatment",
       y = "Change in pH")

What do you observe?

The control treatment has a large variability in pH range, while most of the microbe treatments have narrow narrow pH ranges
When bacteria or yeast are added later, they have the same average effect on the pH, with a much larger range when bacteria introduced later
When bacteria added first regardless if yeast added or not, pH lowers (becomes more acidic)
Meanwhile when yeasts added first regardless if bacteria added or not, pH increases (becomes more basic)
It appears that bacteria and yeast are changing the pH of the nectar in their favor, limiting the ability of the other microbe to thrive

Now let’s explore this by plotting the number of yeast cells by treatment.

First I will create a new dataframe

exp_data_clean_yeast <- exp_data_clean %>%
 filter(microbe == "yeast" & treatment %in% c("Y-", "BY", "YB", "-Y"))

Now create a boxplot comparing treatment and final yeast cell counts:

ggplot(exp_data_clean_yeast, aes(x = treatment, y = final_cells)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1, width = 0.1) +
  labs(title = "Yeast Cells by Treatment",
       x = "Treatments",
       y = "Number of Yeast Cells")

This is a great example of priority effects, where the order of colonization has an effect on community assembly. It also showcases two separate stable states of pH as an outcome based on these priority effects.

The study expands on these results to relate these results to community dynamics as a whole:

They made these conclusions by creating artificial flowers and inoculating them: YouTube Video

Combining Filters

Recall that looking at one filter at a time is an oversimplification, in the real-world these “filters” can occur repeatedly and even simultaneously.

Colonization-Competition Tradeoffs

Let’s look at the combined effects of dispersal and biotic filters through this simulation inspired by Vellend and MacDonald 2016 from the book The Theory of Ecological Communities.

We will explore colonization-competition tradeoff model across two patches. The model captures the essence of the interaction between dispersal capabilities and competitive strength. It is a stochastic simulation where colonization events are interspersed with competitive encounters, and the outcome is contingent on the inherent traits of the species involved and their initial frequencies.

We will simulate this model over 500 years, tracking the frequency of each species within two patches. The species are represented numerically: ‘1’ for yeast and ‘2’ for bacteria. The initial condition is a 50/50 split between the two species in each patch. The model includes the following parameters:

J: The total number of individuals in each patch.
m: The rate of dispersal between patches.
fit.ratio.avg: The average fitness ratio of the species when there is no dispersal, indicating competitive strength.
fit.ratio.m: The fitness ratio during dispersal events, indicating colonization ability.
freq.dep: The frequency dependence in competitive interactions

Plot the results of the simulation

Now vary the parameters, what do you observe?

Colonization Frequency:
- Higher dispersal rates (larger values of m) lead to more frequent colonization events between patches.
- Lower dispersal rates (smaller values of m) result in less frequent colonization events.
Competitive Dynamics:
- Higher average fitness ratios for species 1 (larger values in fit.ratio.avg) result in a competitive advantage for species 1, leading to its dominance over time.
- Lower average fitness ratios for species 1 (smaller values in fit.ratio.avg) favor species 2, causing it to dominate in the long term.
Colonization Ability:
- Higher fitness ratios during dispersal events (larger values in fit.ratio.m) give species 1 a stronger colonization ability, leading to its increased frequency over time.
- Lower fitness ratios during dispersal events (smaller values in fit.ratio.m) weaken species 1’s colonization ability, allowing species 2 to maintain dominance.
Frequency Dependence:
- Positive frequency dependence (positive freq.dep) enhances the competitive advantage of species 1 as its frequency increases, leading to its dominance over time.
- Negative frequency dependence (negative freq.dep) diminishes the competitive advantage of species 1 as its frequency increases, resulting in a more balanced coexistence between species.
Overall Dynamics:
- The interaction between dispersal capabilities, competitive strength, and frequency dependence shapes the long-term dynamics of species coexistence and dominance.
- Different parameter combinations lead to diverse ecological outcomes, including species coexistence, dominance shifts, or extinction events.

Incorporating selection into the Colonization-Competition Tradeoff model

In addressing unexpected behavior observed during high dispersal scenarios, Melissa Chen proposed alterations to the code to incorporate selection at the metacommunity scale. She presented two approaches:

Reproduction Before Dispersal:
- In this approach, individuals reproduce within their “home” patch before dispersing offspring into other patches.
- Fitness differences are weighted by local species frequencies, followed by dispersal abilities, to calculate the probability of species recruitment.
Dispersal Before Reproduction:
- Alternatively, adults disperse to new patches and then reproduce within the new patch.
- Here, metacommunity species frequencies are weighted by dispersal ability differences first, followed by fitness differences in the recruiting patch, to calculate recruitment probabilities.

You can check out this approach online, but serves as another example of the complexity you need to introduce into community assembly models in order for them to become generalizable across real-world systems.

Future directions in the field of Community Assembly

Functional traits

Similar to local species abundance/richness versus species pools, there seems to be filtering effects that limit the functional traits of species (trait space) in a local community. In part this was initially proposed by separating species into feeding guilds, but since that time additional intricacies have been introduced by separating species by multi-dimensional functional traits.

*An example of feeding guilds.* Source: Brian O’Neill; YouTube

Here is an example of visualizing traits. If abiotic filters were the only factor limiting local functional traits, you would expect to see traits focused around an optimum based on abiotic conditions (left), instead we often see traits “spread out” across a larger convex hull (right) similar to what we see for local species richness vs. species pools due to biotic filters:

*In this real world example, functional diversity of British birds (red) is much lower than predicted (blue) indicating filtering is occurring beyond abiotic filtering.* Source: Petchey et. al 2007

Current research has revealed that conducting the types of community assembly analyses covered in this lab but focusing on functional traits instead of species identities are often more accurate for predicting community structure and are more generalizable for making conservation suggestions. Because of this, there has been a recent explosion of interest in compiling species trait data.

Note the consistent relationship among the functional trait leaf mass per unit area (LMA) with both mean annual temperature (MAT) and mean annual rainfall (MAR), despite very different species compositions (red dots) worldwide. Source: Wright et. al 2004

Community Phylogenetics

All of the statements above about functional traits can be applied to phylogenetic similarity. However, although phylogenetic distance is more readily available than trait data, it is not a perfect proxy without fine-tuning. This is because the simplifications that all traits are conserved and that traits are not labile are often both incorrect.

References

Mittelbach, Gary G, and Brian J McGill. 2019. Community Ecology. Oxford University Press.

Stevens, Martin Henry Hoffman. 2009. A Primer of Ecology with r. Springer.