<- read.csv('Lab_8_data/metadata_2015_survey_samples_edit_cfu_master.csv')
meta
#View(meta1)
Introduction to Community Assembly
“Only a fraction of the forms that could theoretically do so actually form a community at any one time.” - Charles Elton (1950)(elton1950ecology?)
Why aren’t all species everywhere?
Why are local scale communities smaller (lower richness) than regional species pools?
Why are there so many species?
One of the most central challenges in ecology is pinpointing what determines the number, kinds, and abundances of species found in a community.
The study of community assembly attempts to describe the complex processes that determine the composition and abundance of species within ecological communities.
Filters
In the traditional view of community assembly, there are three broad categories of processes/filters that determine colonization:
Dispersal
Abiotic/Environmental Filters (e.g. temperature, humidity, precipitation, geological features, nutrient availability, etc.)
Biotic/Interaction Filters (e.g. competition, predation, parasitism, disease, and mutualism)
As you can readily see, community assembly pulls together many of the topics we have covered this semester. As some additional examples, island biogeography theory and neutral theory emphasize a dispersal filter. Meanwhile co-existence theory involves biotic filters. We discussed abiotic filters in the context of population growth. We also learned about biotic filters in the context of density-dependence, competition, and predator/prey models.
These processes are traditionally viewed as sequential filters:
However, it is more accurate to remember that in reality these three processes can act out of order and simultaneously. Imagine the impact of differing R* values in a system with nutrients that fluctuate seasonally or even geographically.
Community Assembly Lab
The dataset for the upcoming lab originates from “pH as an Eco-evolutionary Driver of Priority Effects” by Chappell et al. (2022). It details field surveys on Diplacus aurantiacus nectar across the San Francisco Peninsula and subsequent lab experiments which investigated the role of pH in microbial community dynamics. The study showcased that the microbial dominance within nectar shifts between bacteria (specifically Acinetobacter nectaris) and yeasts (specifically Metschnikowia reukaufii).
Using read.csv()
, load the dataset 2015_survey_samples_edit_cfu_master.csv
as the df observed_data
. This dataset consists of the field collected data.
<- read.csv('Lab_8_data/2015_survey_samples_edit_cfu_master.csv') observed_data
Explore the dataset here and by looking at the metadata.
head(observed_data)
number Plant_ID Flower_ID Date_collected Date_plated Replicate Nectar_cm
1 1 LH1 1 6/23/15 6/24/15 1 0.1
2 2 LH1 2 6/23/15 6/24/15 2 1.4
3 3 LH1 3 6/23/15 6/24/15 3 3.1
4 4 LH1 4 6/23/15 6/24/15 4 2.3
5 5 LH1 5 6/23/15 6/24/15 5 0.3
6 6 LH1 6 6/23/15 6/24/15 6 0.9
Nectar_mm Nectar_vol Stigma_closed Fcfu1 Fcfu2 Fcfu3 Fcfu4 Fcfu5 Fcfu6
1 1 0.1210654 Yes 0 0 0 0 0 0
2 14 1.6949153 Yes 0 0 0 0 0 0
3 31 3.7530266 Yes 57 0 0 0 0 0
4 23 2.7845036 Yes 1 0 0 0 0 0
5 3 0.3631961 Yes 0 0 0 0 0 0
6 9 1.0895884 Yes 0 0 0 0 0 0
F_nontarget Fcfu_avg FCFU_total FCFU_ul log.FCFU.1. Bcfu1 Bcfu2 Bcfu3
1 <NA> 0 0 0 0 0 0 0
2 <NA> 0 0 0 0 38 1 0
3 <NA> 9.5 57 171.0607382 2.235681782 11 1 0
4 <NA> 0.166666667 1 3.836009923 0.684487183 124 0 0
5 <NA> 0 0 0 0 1 0 0
6 <NA> 0 0 0 0 0 0 0
Bcfu4 Bcfu5 Bcfu6 Bcfu_avg BCFU_total BCFUul log.BCFU.1. X X.1 X.2
1 0 0 0 0 0 0 0 NA NA NA
2 0 0 0 6.5 39 6915.391539 3.839879571 NA NA NA
3 0 0 0 2 12 1079.411167 3.033589064 NA NA NA
4 0 0 0 20.66666667 124 14257.11267 4.154062042 NA NA NA
5 0 0 0 0.166666667 1 761.4761476 2.882226262 NA NA NA
6 0 0 0 0 0 0 0 NA NA NA
X.3 X.4 X.5 X.6 X.7 X.8 X.9
1 NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA
4 NA NA NA NA NA NA NA
5 NA NA NA NA NA NA NA
6 NA NA NA NA NA NA NA
length(unique(observed_data$Plant_ID))
[1] 148
%>%
observed_data distinct(Plant_ID) %>%
nrow()
[1] 148
%>%
observed_data distinct(Plant_ID, Flower_ID) %>%
nrow()
[1] 1154
summary(observed_data)
number Plant_ID Flower_ID Date_collected
Min. : 1.0 Length:1157 Min. : 1.0 Length:1157
1st Qu.: 288.0 Class :character 1st Qu.: 289.0 Class :character
Median : 576.0 Mode :character Median : 577.0 Mode :character
Mean : 576.1 Mean : 576.6
3rd Qu.: 864.0 3rd Qu.: 864.0
Max. :1152.0 Max. :1152.0
NA's :4 NA's :4
Date_plated Replicate Nectar_cm Nectar_mm
Length:1157 Min. :1.000 Min. :0.0100 Min. : 0.000
Class :character 1st Qu.:2.000 1st Qu.:0.3000 1st Qu.: 3.000
Mode :character Median :4.000 Median :0.7000 Median : 7.000
Mean :4.497 Mean :0.9831 Mean : 9.823
3rd Qu.:6.000 3rd Qu.:1.4000 3rd Qu.:14.000
Max. :8.000 Max. :5.6000 Max. :56.000
NA's :4 NA's :5 NA's :4
Nectar_vol Stigma_closed Fcfu1 Fcfu2
Min. :0.0000 Length:1157 Min. : 0.000 Min. : 0.000
1st Qu.:0.3632 Class :character 1st Qu.: 0.000 1st Qu.: 0.000
Median :0.8475 Mode :character Median : 0.000 Median : 0.000
Mean :1.1892 Mean : 8.441 Mean : 9.083
3rd Qu.:1.6949 3rd Qu.: 3.000 3rd Qu.: 0.750
Max. :6.7797 Max. :576.000 Max. :752.000
NA's :4 NA's :4 NA's :879
Fcfu3 Fcfu4 Fcfu5 Fcfu6
Min. : 0.0000 Min. : 0.0000 Min. :0 Min. :0
1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.:0 1st Qu.:0
Median : 0.0000 Median : 0.0000 Median :0 Median :0
Mean : 0.4836 Mean : 0.0711 Mean :0 Mean :0
3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.:0 3rd Qu.:0
Max. :43.0000 Max. :13.0000 Max. :0 Max. :0
NA's :944 NA's :960 NA's :965 NA's :965
F_nontarget Fcfu_avg FCFU_total FCFU_ul
Length:1157 Length:1157 Length:1157 Length:1157
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
log.FCFU.1. Bcfu1 Bcfu2 Bcfu3
Length:1157 Length:1157 Min. : 0.00 Min. : 0.000
Class :character Class :character 1st Qu.: 1.00 1st Qu.: 0.000
Mode :character Mode :character Median : 2.00 Median : 0.000
Mean : 31.68 Mean : 1.781
3rd Qu.: 5.00 3rd Qu.: 1.000
Max. :10000.00 Max. :48.000
NA's :728 NA's :1002
Bcfu4 Bcfu5 Bcfu6 Bcfu_avg
Min. :0.0000 Min. :0 Min. :0 Length:1157
1st Qu.:0.0000 1st Qu.:0 1st Qu.:0 Class :character
Median :0.0000 Median :0 Median :0 Mode :character
Mean :0.2991 Mean :0 Mean :0
3rd Qu.:0.0000 3rd Qu.:0 3rd Qu.:0
Max. :6.0000 Max. :0 Max. :0
NA's :1050 NA's :1058 NA's :1057
BCFU_total BCFUul log.BCFU.1. X
Min. : 0.0 Length:1157 Length:1157 Mode:logical
1st Qu.: 0.0 Class :character Class :character NA's:1157
Median : 3.0 Mode :character Mode :character
Mean : 375.7
3rd Qu.: 18.0
Max. :20000.0
NA's :4
X.1 X.2 X.3 X.4 X.5
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:1157 NA's:1157 NA's:1157 NA's:1157 NA's:1157
X.6 X.7 X.8 X.9
Mode:logical Mode:logical Mode:logical Mode:logical
NA's:1157 NA's:1157 NA's:1157 NA's:1157
Describe the dataset:
- The dataset includes unique plant and flower IDs across sites.
- Each row represents a unique flower. Along with other microbe measurement data, it includes the number of different species of bacteria (Bcfus, or bacterial colony forming units) and fungi (Fcfu) in each plant/flower. Also the amount of nectar is detailed.
- 148 plants and 1,154 flowers were surveyed.
- Oddly, the site numbers or locations are not included.
- Also oddly, the units for vol are not given in the metadata
What needs to be cleaned from the dataset?
- There are whole rows with mostly NAs
- There are NAs present in the Bcfu and Fcfu columns, which we can convert to 0s for our purposes
- There is a plant named “75”
- Some of the columns (like Fcfu_avg, FCFU_total, Bcfu_avg) are character columns and should be numeric
- There are a bunch of empty x columns we should remove, probably from the formatting of the original CSV files
- The amount of fungi/bacteria present should depend on the volume of nectar present, so let’s create new columns for fungal/bacterial concentration
- Also, for the purpose of this exercise let’s just focus on total cfus, so let’s remove the other columns for the more specific morphologies
Clean the dataset:
# Clean our dataset
<- observed_data %>%
observed_data_clean select(-number, -Date_plated, -Replicate, -Nectar_cm,
-Nectar_mm, -F_nontarget, -FCFU_ul, -`log.FCFU.1.`, -BCFUul,
-`log.BCFU.1.`, -starts_with("X")) %>%
mutate(
FCFU_total = as.numeric(FCFU_total),
BCFU_total = as.numeric(BCFU_total),
Bcfu1 = as.numeric(Bcfu1),
Fcfu_avg = as.numeric(Fcfu_avg),
Bcfu_avg = as.numeric(Bcfu_avg),
Bcfu_conc = BCFU_total / Nectar_vol,
Fcfu_conc = FCFU_total / Nectar_vol,
Source = "Observed", # we will use this column later
site_name = stringr::str_extract(Plant_ID, "^[A-Z]+") # this will be useful for the homework
%>%
)
replace_na(list(FCFU_total = 0, BCFU_total = 0, Fcfu_avg = 0, Bcfu_avg = 0, Nectar_vol = 0, Fcfu_conc = 0, Bcfu_conc = 0)) %>%
filter(Plant_ID != "" | Plant_ID != "75")
summary(observed_data_clean)
Plant_ID Flower_ID Date_collected Nectar_vol
Length:1157 Min. : 1.0 Length:1157 Min. :0.0000
Class :character 1st Qu.: 289.0 Class :character 1st Qu.:0.3632
Mode :character Median : 577.0 Mode :character Median :0.8475
Mean : 576.6 Mean :1.1851
3rd Qu.: 864.0 3rd Qu.:1.6949
Max. :1152.0 Max. :6.7797
NA's :4
Stigma_closed Fcfu1 Fcfu2 Fcfu3
Length:1157 Min. : 0.000 Min. : 0.000 Min. : 0.0000
Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000
Mode :character Median : 0.000 Median : 0.000 Median : 0.0000
Mean : 8.441 Mean : 9.083 Mean : 0.4836
3rd Qu.: 3.000 3rd Qu.: 0.750 3rd Qu.: 0.0000
Max. :576.000 Max. :752.000 Max. :43.0000
NA's :4 NA's :879 NA's :944
Fcfu4 Fcfu5 Fcfu6 Fcfu_avg
Min. : 0.0000 Min. :0 Min. :0 Min. : 0.000
1st Qu.: 0.0000 1st Qu.:0 1st Qu.:0 1st Qu.: 0.000
Median : 0.0000 Median :0 Median :0 Median : 0.000
Mean : 0.0711 Mean :0 Mean :0 Mean : 6.705
3rd Qu.: 0.0000 3rd Qu.:0 3rd Qu.:0 3rd Qu.: 2.333
Max. :13.0000 Max. :0 Max. :0 Max. :576.000
NA's :960 NA's :965 NA's :965
FCFU_total Bcfu1 Bcfu2 Bcfu3
Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.0 1st Qu.: 1.00 1st Qu.: 0.000
Median : 0.000 Median : 2.0 Median : 2.00 Median : 0.000
Mean : 8.738 Mean : 366.5 Mean : 31.68 Mean : 1.781
3rd Qu.: 1.816 3rd Qu.: 16.0 3rd Qu.: 5.00 3rd Qu.: 1.000
Max. :1394.673 Max. :11136.0 Max. :10000.00 Max. :48.000
NA's :13 NA's :728 NA's :1002
Bcfu4 Bcfu5 Bcfu6 Bcfu_avg
Min. :0.0000 Min. :0 Min. :0 Min. : 0.0
1st Qu.:0.0000 1st Qu.:0 1st Qu.:0 1st Qu.: 0.0
Median :0.0000 Median :0 Median :0 Median : 2.0
Mean :0.2991 Mean :0 Mean :0 Mean : 284.7
3rd Qu.:0.0000 3rd Qu.:0 3rd Qu.:0 3rd Qu.: 10.5
Max. :6.0000 Max. :0 Max. :0 Max. :10000.0
NA's :1050 NA's :1058 NA's :1057
BCFU_total Bcfu_conc Fcfu_conc Source
Min. : 0.0 Min. : 0.00 Min. : 0.000 Length:1157
1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.000 Class :character
Median : 3.0 Median : 4.13 Median : 0.000 Mode :character
Mean : 374.4 Mean : 672.47 Mean : 6.665
3rd Qu.: 18.0 3rd Qu.: 20.65 3rd Qu.: 3.000
Max. :20000.0 Max. :82600.00 Max. :576.000
site_name
Length:1157
Class :character
Mode :character
Dispersal
Now that we have loaded and cleaned our dataset, let’s simulate dispersal if there was no filtering. We will simulate the dispersal being uniformly (not randomly) distributed because for dispersal that would be closest to a null hypothesis in most situations.
Let’s simulate our data as if it was uniformly distributed across flowers
<- observed_data_clean[, c("Plant_ID", "Flower_ID", "Stigma_closed", "Nectar_vol")]
simulated_data
# Based on the max value from our data summary above
<- 0:10000
cfu_count_range
# Simulate uniform CFU counts for fungal and bacterial samples
for(i in 1:6){
paste0('Fcfu', i)]] <- sample(cfu_count_range, nrow(simulated_data), replace = TRUE)
simulated_data[[paste0('Bcfu', i)]] <- sample(cfu_count_range, nrow(simulated_data), replace = TRUE)
simulated_data[[
}
head(simulated_data)
Plant_ID Flower_ID Stigma_closed Nectar_vol Fcfu1 Bcfu1 Fcfu2 Bcfu2 Fcfu3
1 LH1 1 Yes 0.1210654 5926 2355 2140 5161 6499
2 LH1 2 Yes 1.6949153 6059 5822 1346 1409 7453
3 LH1 3 Yes 3.7530266 5345 9047 8453 4788 4347
4 LH1 4 Yes 2.7845036 3217 4544 1014 1471 2783
5 LH1 5 Yes 0.3631961 4756 206 4777 4575 5986
6 LH1 6 Yes 1.0895884 6667 499 4909 7293 5058
Bcfu3 Fcfu4 Bcfu4 Fcfu5 Bcfu5 Fcfu6 Bcfu6
1 4809 4166 1721 6373 907 7697 1771
2 286 8672 7330 8382 595 2015 2310
3 9165 2811 8526 8740 8224 6192 3404
4 9488 1028 2898 6346 3106 6256 9278
5 9494 134 4596 5357 5562 2016 5287
6 5326 5025 3765 3377 2041 3359 6604
# Calculate other columns from our orignal dataset
<- simulated_data %>%
simulated_data mutate(
FCFU_total = rowSums(across(starts_with("Fcfu"))),
BCFU_total = rowSums(across(starts_with("Bcfu"))),
Fcfu_avg = FCFU_total / 6,
Bcfu_avg = BCFU_total / 6,
Fcfu_conc = FCFU_total / Nectar_vol,
Bcfu_conc = BCFU_total / Nectar_vol
)
<- simulated_data %>%
simulated_data select(
-Fcfu1, -Bcfu1,
-Fcfu2, -Bcfu2,
-Fcfu3, -Bcfu3,
-Fcfu4, -Bcfu4,
-Fcfu5, -Bcfu5,
-Fcfu6, -Bcfu6
%>%
) mutate(Source = "Simulated")
head(simulated_data)
Plant_ID Flower_ID Stigma_closed Nectar_vol FCFU_total BCFU_total Fcfu_avg
1 LH1 1 Yes 0.1210654 32801 16724 5466.833
2 LH1 2 Yes 1.6949153 33927 17752 5654.500
3 LH1 3 Yes 3.7530266 35888 43154 5981.333
4 LH1 4 Yes 2.7845036 20644 30785 3440.667
5 LH1 5 Yes 0.3631961 23026 29720 3837.667
6 LH1 6 Yes 1.0895884 28395 25528 4732.500
Bcfu_avg Fcfu_conc Bcfu_conc Source
1 2787.333 270936.261 138140.24 Simulated
2 2958.667 20016.930 10473.68 Simulated
3 7192.333 9562.415 11498.45 Simulated
4 5130.833 7413.889 11055.83 Simulated
5 4953.333 63398.253 81829.07 Simulated
6 4254.667 26060.300 23429.03 Simulated
Let’s plot our observed versus simulated data:
dim(observed_data_clean)
[1] 1157 25
dim(simulated_data)
[1] 1157 11
names(observed_data_clean)
[1] "Plant_ID" "Flower_ID" "Date_collected" "Nectar_vol"
[5] "Stigma_closed" "Fcfu1" "Fcfu2" "Fcfu3"
[9] "Fcfu4" "Fcfu5" "Fcfu6" "Fcfu_avg"
[13] "FCFU_total" "Bcfu1" "Bcfu2" "Bcfu3"
[17] "Bcfu4" "Bcfu5" "Bcfu6" "Bcfu_avg"
[21] "BCFU_total" "Bcfu_conc" "Fcfu_conc" "Source"
[25] "site_name"
names(simulated_data)
[1] "Plant_ID" "Flower_ID" "Stigma_closed" "Nectar_vol"
[5] "FCFU_total" "BCFU_total" "Fcfu_avg" "Bcfu_avg"
[9] "Fcfu_conc" "Bcfu_conc" "Source"
<- full_join(observed_data_clean, simulated_data) combined_data
Joining with `by = join_by(Plant_ID, Flower_ID, Nectar_vol, Stigma_closed,
Fcfu_avg, FCFU_total, Bcfu_avg, BCFU_total, Bcfu_conc, Fcfu_conc, Source)`
dim(combined_data)
[1] 2314 25
ggplot(combined_data, aes(x = Source, y = Fcfu_conc, color = Source)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.1, width = 0.1) +
scale_y_log10() +
labs(title = "Fungal CFU Concentration (per vol of nectar)",
y = "Concentration")
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
log-10 transformation introduced infinite values.
Warning: Removed 731 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Remember that it is crucial to always show your actual data, that is why I included points in our plot. Note that I jittered the points for increased visibility for many points versus geom_point()
. I also set geom_boxplot()
to outlier.shape = NA
so that geom_boxplot()
would not also plot the points.
Now repeat for the bacterial data:
ggplot(combined_data, aes(x = Source, y = Bcfu_conc, color = Source)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.1, width = 0.1) +
scale_y_log10() +
labs(title = "Bacterial CFU Concentration (per vol of nectar)",
y = "Concentration")
What do you observe?
- It appears the actual dispersal of microbes is dramatically different than if it was just uniformly distributed. This suggests dispersal could be a filter affecting the assembly of these microbe communities.
If we had geospatial data for our dataset, we could do a similar simple analysis. Here I am simulating an artificial “observed” population that is not uniformly distributed along with a simulated distribution that is uniformly distributed.
# Simulating "observed" data with specific lat-lon distribution patterns
<- data.frame(
observed_data Plant_ID = 1:100,
lat = rnorm(100, mean = 34, sd = 0.1), # Simulating around a central latitude
lon = rnorm(100, mean = -118, sd = 0.1), # Simulating around a central longitude
Source = 'Observed'
)
# Simulating "simulated" data with a uniform distribution across the given range
<- data.frame(
simulated_data Plant_ID = 1:100,
lat = runif(100, min = 33.8, max = 34.2), # Uniform distribution within a latitudinal range
lon = runif(100, min = -118.2, max = -117.8), # Uniform distribution within a longitudinal range
Source = 'Simulated'
)
<- rbind(observed_data, simulated_data) combined_data
Now let’s plot this data colored by Source
with geom_point
, geom_density2d()
, and faceted by Source
ggplot(combined_data, aes(x = lon, y = lat, color = Source)) +
geom_point() +
geom_density2d() +
facet_wrap(~ Source) +
labs(title = "Spatial Distribution of Microbial Dispersal with Density Contours",
x = "Longitude",
y = "Latitude")
What do you observe?
- We can see that observed data is not uniformly distributed, with density increasing around a focal point. Meanwhile the simulated data is more uniformly distributed across its range.
Abiotic filters
Load in the file “N06-P02-E29_MR1_clone_1_all_evol_vs._FNA17_-_Low_pH-_Sheet1.csv” as the df “exp_data” using read.csv()
<- read.csv("Lab_8_data/N06-P02-E29_MR1_clone_1_all_evol_vs._FNA17_-_Low_pH_-_Sheet1.csv") exp_data
<- read.csv("Lab_8_data/metadata_N06-P02-E29_MR1_clone_1_all_evol_vs._FNA17_-_Low_pH.csv") meta_exp
Inspect the dataset
summary(exp_data)
experiment nectar date_count_inoc date_count_final
Min. :20.00 Length:3072 Length:3072 Length:3072
1st Qu.:21.75 Class :character Class :character Class :character
Median :25.50 Mode :character Mode :character Mode :character
Mean :24.75
3rd Qu.:27.25
Max. :29.00
column row location microbe
Min. : 1.00 Length:3072 Length:3072 Length:3072
1st Qu.: 3.75 Class :character Class :character Class :character
Median : 6.50 Mode :character Mode :character Mode :character
Mean : 6.50
3rd Qu.: 9.25
Max. :12.00
strain evol_treatment evol_comp clone
Length:3072 Length:3072 Length:3072 Length:3072
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
evol_rep evol_rep_comp tech_rep treatment
Length:3072 Length:3072 Min. :1.000 Length:3072
Class :character Class :character 1st Qu.:1.000 Class :character
Mode :character Mode :character Median :1.000 Mode :character
Mean :2.426
3rd Qu.:4.000
Max. :9.000
X96_well_plate plate_type inoc_density inoc_CFU
Min. :1.0 Length:3072 Length:3072 Min. : 0.0
1st Qu.:1.0 Class :character Class :character 1st Qu.: 0.0
Median :1.5 Mode :character Mode :character Median : 7.0
Mean :1.5 Mean : 360.9
3rd Qu.:2.0 3rd Qu.: 15.0
Max. :2.0 Max. :100000.0
NA's :1737
inoc_dilution inoc_CFU_uL inoc_cells final_dilution
Min. : 0 Min. : 0 Min. :0.000 Min. : 0
1st Qu.: 10 1st Qu.: 0 1st Qu.:0.000 1st Qu.: 100
Median : 100 Median : 22 Median :1.362 Median : 1000
Mean : 21785 Mean : 52677 Mean :1.918 Mean : 19570
3rd Qu.: 10000 3rd Qu.: 6667 3rd Qu.:3.824 3rd Qu.: 10000
Max. :1000000 Max. :9666667 Max. :6.985 Max. :100000
NA's :2085 NA's :1576 NA's :1576 NA's :1341
final_CFU final_CFU_uL final_cells notes
Min. : 0.000 Min. : 0 Min. :0.000 Length:3072
1st Qu.: 0.000 1st Qu.: 0 1st Qu.:0.000 Class :character
Median : 3.000 Median : 16 Median :1.176 Mode :character
Mean : 6.844 Mean : 22167 Mean :1.853
3rd Qu.:10.000 3rd Qu.: 8000 3rd Qu.:3.842
Max. :78.000 Max. :1760000 Max. :6.246
NA's :180 NA's :123 NA's :60
inoc_pH final_pH final_dilution_P1 final_CFU_P1
Min. :3.000 Min. :2.000 Min. : 1 Min. : 0.000
1st Qu.:3.000 1st Qu.:2.000 1st Qu.: 10 1st Qu.: 3.000
Median :3.500 Median :2.500 Median : 100 Median : 5.000
Mean :3.349 Mean :2.568 Mean : 4711 Mean : 8.396
3rd Qu.:3.500 3rd Qu.:3.000 3rd Qu.:10000 3rd Qu.: 8.000
Max. :3.500 Max. :4.500 Max. :10000 Max. :57.000
NA's :2880 NA's :2688 NA's :3027 NA's :3024
final_CFU_uL_P1 final_cells_P1 final_dilution_P2 final_CFU_P2
Min. : 0.0 Min. :0.0000 Min. : 10 Min. : 0.000
1st Qu.: 7.7 1st Qu.:0.0000 1st Qu.: 10 1st Qu.: 2.750
Median : 80.0 Median :0.0000 Median : 5050 Median : 5.000
Mean : 5457.2 Mean :0.2809 Mean : 5018 Mean : 5.125
3rd Qu.:10000.0 3rd Qu.:0.0000 3rd Qu.:10000 3rd Qu.: 7.250
Max. :28000.0 Max. :4.4472 Max. :10000 Max. :14.000
NA's :3024 NA's :2688 NA's :3030 NA's :3024
final_CFU_uL_P2 final_cells_P2
Min. : 0 Min. :0.0000
1st Qu.: 6 1st Qu.:0.0000
Median : 100 Median :0.0000
Mean : 5228 Mean :0.2815
3rd Qu.:10000 3rd Qu.:0.0000
Max. :28000 Max. :4.4472
NA's :3024 NA's :2688
unique(exp_data$microbe)
[1] "yeast" "bacteria" "none"
What do we need to clean from this dataset?
Remove na and “none” rows for microbes. Remove na rows for “inoc_pH” “final_pH.” Remove rows where treatment is blank
Replace nas for rows inoc_cells and final_cells with 0s
<- exp_data %>%
exp_data_clean # Remove na and "none" rows for microbes. Remove na rows for "inoc_pH" and "final_pH." Remove rows where treatment is blank
filter(!is.na(microbe) & !is.na(final_pH) & !is.na(inoc_pH) & microbe != "none" & treatment != "") %>%
# Replace nas for rows inoc_cells, final_cells with 0s
replace_na(list(inoc_cells = 0, final_cells = 0, inoc_pH = 0)) %>%
# Select only the columns microbe, inoc_cells, final_cells, treatment, inoc_pH, and final_pH
select(microbe, inoc_cells, final_cells, treatment, inoc_pH, final_pH, evol_treatment) %>%
# Create column "change_in_cells" from "finals_cells - inoc_cells" and column "delta_pH" from "final_pH - inoc_pH"
mutate(
change_in_cells = final_cells - inoc_cells,
delta_pH = final_pH - inoc_pH
)
For “filter(microbe ==”bacteria”)“, plot final_pH
by change_in_cells
using geom_jitter()
and geom_smooth()
with method = "lm"
. Note throughout this lab that we are oversimplifying this analysis by using geom_smooth()
, for a real study we should be sure we are exploring various models to most accurately fit our data.
ggplot(filter(exp_data_clean, microbe == "bacteria"), aes(x = final_pH, y = change_in_cells)) +
geom_jitter(alpha = 0.1, width = 0.05) +
geom_smooth(method = "lm") +
labs(title = "Impact of Final pH on Microbial Growth",
x = "Final pH",
y = "Change in Bacterial Cell Count (CFU)")
`geom_smooth()` using formula = 'y ~ x'
Do the same for “filter(exp_data_clean, microbe ==”yeast”),” but log-scale the Y-axis and set geom_smooth
with method = "lm"
to fullrange = TRUE
.
ggplot(filter(exp_data_clean, microbe == "yeast"), aes(x = final_pH, y = change_in_cells)) +
geom_jitter(alpha = 0.1, width = 0.05) +
geom_smooth(method = "lm", fullrange = TRUE) +
scale_y_log10() +
labs(title = "Impact of Final pH on Microbial Growth",
x = "Final pH",
y = "Change in Yeast Cell Count (CFU)")
`geom_smooth()` using formula = 'y ~ x'
What do you observe in this oversimplified analysis?
- Bacteria responds negatively to higher pH, while yeast are responding positively. This suggests that pH may be acting as an abiotic filter on the microbial community. Overall, bacteria does better in a more acidic environment while yeast does better in a more alkaline environment.
Biotic Filters
Nectar Treatments
Let’s continue working with our exp_data_filtered
dataframe by looking into the results of colonization treatments:
Let’s see if the microbes are themselves affecting the pH. If they are affecting the pH, what could this imply?
- That through modifying pH, one microbe could be controlling the other microbe through competition as a biotic filter.
Create a box-plot to explore the relationship between treatment
and delta_pH
:
ggplot(exp_data_clean, aes(x = treatment, y = delta_pH, color = treatment)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.1, width = 0.1) +
labs(title = "Change in pH by Treatment",
x = "Treatment",
y = "Change in pH")
What do you observe?
- The control treatment has a high pH (more basic)
- When bacteria or yeast are added later, they have the same average effect on the pH. We see differences when either bacteria or yeast added first.
- When bacteria added first regardless if yeast added or not, pH lowers (becomes more acidic)
- Meanwhile when yeasts added first regardless if bacteria added or not, pH increases (becomes more basic). There is also a much larger variability in pH
- It appears that bacteria is changing the pH of the nectar in its favor, limiting the ability of yeast to thrive
Now let’s explore this by plotting the number of yeast cells by treatment.
First I will create a new dataframe
<- exp_data_clean %>%
exp_data_clean_yeast filter(microbe == "yeast" &
== "Y-" | treatment == "BY" | treatment == "YB" | treatment == "-Y"))
(treatment
head(exp_data_clean_yeast)
microbe inoc_cells final_cells treatment inoc_pH final_pH evol_treatment
1 yeast 0 3.556423 YB 3.0 2.5 low_pH
2 yeast 0 3.662852 YB 3.5 2.5 low_pH
3 yeast 0 3.869290 YB 3.5 2.5 low_pH
4 yeast 0 3.732474 YB 3.0 2.5 low_pH
5 yeast 0 3.662852 YB 3.0 2.5 ancestral
6 yeast 0 3.602169 YB 3.0 2.5 ancestral
change_in_cells delta_pH
1 3.556423 -0.5
2 3.662852 -1.0
3 3.869290 -1.0
4 3.732474 -0.5
5 3.662852 -0.5
6 3.602169 -0.5
Now create a boxplot comparing treatment and final yeast cell counts:
ggplot(exp_data_clean_yeast, aes(x = treatment, y = final_cells)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.1, width = 0.1) +
labs(title = "Yeast Cells by Treatment",
x = "Treatments",
y = "Number of Yeast Cells")
ggplot(exp_data_clean_yeast, aes(x = treatment, y = final_cells)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.1, width = 0.1) +
labs(title = "Yeast Cells by Treatment",
x = "Treatments",
y = "Number of Yeast Cells")
This is a great example of priority effects, where the order of colonization has an effect on community assembly. It also showcases two separate stable states of pH as an outcome based on these priority effects.
The study expands on these results to relate these results to community dynamics as a whole:
They made these conclusions by creating artificial flowers and inoculating them: YouTube Video
Combining Filters
Recall that looking at one filter at a time is an oversimplification, in the real-world these “filters” can occur repeatedly and even simultaneously.
Colonization-Competition Tradeoffs
Let’s look at the combined effects of dispersal and biotic filters through this simulation inspired by Vellend and MacDonald 2016(vellend2016simulating?) from the book The Theory of Ecological Communities.
We will explore colonization-competition tradeoff model across two patches. The model captures the !essence of the interaction between dispersal capabilities and competitive strength. It is a stochastic simulation where colonization events are interspersed with competitive encounters, and the outcome is contingent on the inherent traits of the species involved and their initial frequencies.
We will simulate this model over 100 years, tracking the frequency of each species within two patches. The species are represented numerically: ‘1’ for yeast and ‘2’ for bacteria. The initial condition is a 50/50 split between the two species in each patch. The model includes the following parameters:
- J: The total number of individuals in each patch.
- m: The rate of dispersal between patches.
- fit.ratio.avg: The average fitness ratio of the species when there is no dispersal, indicating competitive strength.
- fit.ratio.m: The fitness ratio during dispersal events, indicating colonization ability.
- freq.dep: The frequency dependence in competitive interactions
# Set parameters and initial conditions
<- 100
num.years <- 2
num.patch <- 1000
J <- 0.05
m <- c(1.2, 1.2)
fit.ratio.avg <- 1/5
fit.ratio.m <- rep(0, num.patch)
freq.dep .1 <- 0.5 * J
init<- matrix(ifelse(runif(J * num.patch) < 0.5, 1, 2), ncol = num.patch)
COM
# Create a data frame to store the frequency of species 1 per year in each patch
1.mat <- as.data.frame(matrix(ncol = num.patch, nrow = num.years))
freq.names(freq.1.mat) <- paste0("Patch", 1:num.patch)
# Populate the initial frequencies
for (patch in 1:num.patch) {
1.mat[1, patch] <- sum(COM[, patch] == 1) / J
freq.
}
# Run the simulation
for (year in 2:num.years) {
for (i in 1:(J * num.patch)) {
# Choose a patch where a death event will occur
<- sample(1:num.patch, 1)
patch
# Calculate Pr.1 if dispersal occurs
1.meta <- sum(COM == 1) / (J * num.patch)
freq..1 <- ifelse(runif(1) < m,
Pr* freq.1.meta / (fit.ratio.m * freq.1.meta + (1 - freq.1.meta)),
fit.ratio.m
{.1 <- sum(COM[, patch] == 1) / J
freq.2 <- 1 - freq.1
freq<- exp(freq.dep[patch] * (freq.1 - 0.5) + log(fit.ratio.avg[patch]))
fit.ratio * freq.1 / (fit.ratio * freq.1 + freq.2)
fit.ratio
})
# Implement a birth-death event
sample(J, 1), patch] <- sample(c(1, 2), 1, prob = c(Pr.1, 1 - Pr.1))
COM[
}
# Record the frequency of species 1 for the current year
for (patch in 1:num.patch) {
1.mat[year, patch] <- sum(COM[, patch] == 1) / J
freq.
} }
Plot the results of the simulation
# Convert to long format for ggplot
1.long <- freq.1.mat %>%
freq.mutate(Year = 1:num.years) %>%
gather(key = "Patch", value = "Frequency", -Year)
ggplot(freq.1.long, aes(x = Year, y = Frequency, group = Patch, color = Patch)) +
geom_line() +
labs(x = "Time", y = "Frequency of species 1", title = "Colonization-Competition Tradeoff Model")
Now vary the parameters, what do you observe?
Colonization Frequency:
Higher dispersal rates (larger values of m) lead to more frequent colonization events between patches.
Lower dispersal rates (smaller values of m) result in less frequent colonization events.
Competitive Dynamics:
Higher average fitness ratios for species 1 (larger values in fit.ratio.avg) result in a competitive advantage for species 1, leading to its dominance over time.
Lower average fitness ratios for species 1 (smaller values in fit.ratio.avg) favor species 2, causing it to dominate in the long term.
Colonization Ability:
Higher fitness ratios during dispersal events (larger values in fit.ratio.m) give species 1 a stronger colonization ability, leading to its increased frequency over time.
Lower fitness ratios during dispersal events (smaller values in fit.ratio.m) weaken species 1’s colonization ability, allowing species 2 to maintain dominance.
Frequency Dependence:
Positive frequency dependence (positive freq.dep) enhances the competitive advantage of species 1 as its frequency increases, leading to its dominance over time.
Negative frequency dependence (negative freq.dep) diminishes the competitive advantage of species 1 as its frequency increases, resulting in a more balanced coexistence between species.
Overall Dynamics:
The interaction between dispersal capabilities, competitive strength, and frequency dependence shapes the long-term dynamics of species coexistence and dominance.
Different parameter combinations lead to diverse ecological outcomes, including species coexistence, dominance shifts, or extinction events.
Incorporating selection into the Colonization-Competition Tradeoff model
In addressing unexpected behavior observed during high dispersal scenarios, Melissa Chen proposed alterations to the code to incorporate selection at the metacommunity scale. She presented two approaches:
Reproduction Before Dispersal:
In this approach, individuals reproduce within their “home” patch before dispersing offspring into other patches.
Fitness differences are weighted by local species frequencies, followed by dispersal abilities, to calculate the probability of species recruitment.
Dispersal Before Reproduction:
Alternatively, adults disperse to new patches and then reproduce within the new patch.
Here, metacommunity species frequencies are weighted by dispersal ability differences first, followed by fitness differences in the recruiting patch, to calculate recruitment probabilities.
You can check out this approach online, but serves as another example of the complexity you need to introduce into community assembly models in order for them to become generalizable across real-world systems.
Future directions in the field of Community Assembly
Functional traits
Similar to local species abundance/richness versus species pools, there seems to be filtering effects that limit the functional traits of species (trait space) in a local community. In part this was initially proposed by separating species into feeding guilds(fox1993assembly?), but since that time additional intricacies have been introduced by separating species by multi-dimensional functional traits.
Here is an example of visualizing traits. If abiotic filters were the only factor limiting local functional traits, you would expect to see traits focused around an optimum based on abiotic conditions (left), instead we often see traits “spread out” across a larger convex hull (right) similar to what we see for local species richness vs. species pools due to biotic filters:
Current research has revealed that conducting the types of community assembly analyses covered in this lab but focusing on functional traits instead of species identities are often more accurate for predicting community structure and are more generalizable for making conservation suggestions. Because of this, there has been a recent explosion of interest in compiling species trait data.
Community Phylogenetics
All of the statements above about functional traits can be applied to phylogenetic similarity. However, although phylogenetic distance is more readily available than trait data, it is not a perfect proxy without fine-tuning. This is because the simplifications that all traits are conserved and that traits are not labile are often both incorrect.