Lab 9 - Taxonomic and Phylogenetic Diversity

Authors

Jayme Lewthwaite

Hanna Jackson

Charles Lehnen

Melissa Guzman

Cameron Egan

In order to begin exploring evolutionary history in the context of biodiversity analyses, we will first explore taxonomic and then phylogenetic diversity analyses.

Introduction to Taxonomic Diversity Analyses

In the first section of this lab, we will examine the patterns of average body size across 198 families of animals (kingdom Animalia) from a the 2012 article “Can clade age alone explain the relationship between body size and diversity?.”(Etienne et al. 2012)

Load in “Etienne_etal_metfam_data.txt” as df animalia, noting that this is a different filetype than we have used before. Explore the file outside of Rstudio. Now explore the read.table() help page to try to figure out which parameter values to specify to load the file in properly:

animalia <- read.table("Lab_9_data/Etienne_etal_metfam_data.txt", header = TRUE, sep = "\t", as.is = TRUE)

head(animalia)
  nr     phylum   class       order        family    bodysize    age richness
1  1 Arthropoda Insecta Hymenoptera    Cynipoidae 0.000000750  77.05     2500
2  2 Arthropoda Insecta Hymenoptera Ichneumonidae 0.000005790 142.85    60000
3  3 Arthropoda Insecta Hymenoptera    Braconidae 0.000000660 142.85    50000
4  4 Arthropoda Insecta Hymenoptera        Apidae 0.000150317  63.60    20000
5  5 Arthropoda Insecta Hymenoptera      Vespidae 0.000124380 142.85     1000
6  6 Arthropoda Insecta Hymenoptera    Formicidae 0.000001650 118.50    15000
  N_E refs.M..N_L..age...N_E       X
1  -1                    1,3 1.00000
2  -1                    1,4 0.72968
3  -1                    1,3 0.72968
4  -1                    1,2 0.72968
5  -1                  1,3,5 0.72968
6  -1                    1,4 0.72968

Take a moment to get familiar with the data. Describe the dataset:

dim(animalia)
[1] 198  11
  • The dataframe consists of 198 family rows, each with various summary measures for a different animal family, including the average body size of species within that family ($bodysize), the crown age of the family ($age), and its approximate number of species ($richness).
Crown Age:

Age of the most recent common ancestor of all species in a clade

Stem Age:

Time since an ancestral lineage diverged from its sister clade

Source: Phylogenetic Comparative Methods(Harmon 2019)

The dataset consists of 5 very distinct phyla:

unique(animalia$phylum)
[1] "Arthropoda"    "Bryozoa"       "Mollusca"      "Echinodermata"
[5] "Chordata"     

Plot family mean body size against species richness:

ggplot(animalia, aes(x = bodysize, y = richness)) +
  geom_point() +
  labs(title = "Animal family richness vs average body size",
       x = "Average Body Size",
       y = "Species Richness")

We will now plot the relationship between log-transformed body size and log-transformed richness:

ggplot(animalia, aes(x = log10(bodysize), y = log10(richness))) +
  geom_point() +
  labs(title = "Animal family log richness vs log average body size",
       x = "Log10(Average Body Size)",
       y = "Log10(Species Richness)")

  1. Which plot is more informative (un-transformed or log-transformed) and why? Think beyond visual analysis but also biological significance.
  • Log-transformed

  • Un-transformed is squished to one side on the x-axis so cannot distinguish differences among really small-sized families. Log transformation spreads the values more evenly along the x-axis.

  • Log transformation also makes more biological sense: a 100g difference in body size has a profound ecological significance if we are comparing two animals that weigh less than 200g, but is trivial if we are comparing two animals that weigh more than 100kg.

  • Another way to think about this is that differences in log-space are proportional: so a 10% change to a 10kg organism and a 10% change to a 100kg organism is recorded as the same amount of change in log space

  1. On the basis of this plot of average body size against species richness across all animal families, what conclusion would you draw about the association between these two variables?
  • There is a negative relationship between the average body size of a family and its richness, though with large variation in species richness among families.

Taxonomic bias is a common issue in biodiversity studies. Taxonomic bias occurs when the number of species in a group is not proportional to the number of species in the real world. For example, there are more species of arthropods than chordates, but the number of species in a dataset may not reflect this.

Create a stacked geom_bar() plot to check for taxonomic bias in our dataset to the class level by plotting by phylum and filling by class:

ggplot(data = animalia, aes(x = phylum, fill = class) ) +
  geom_bar(position = "stack")

With two intuitive lines of code, we are able to generate a useful plot that helps us to see the bias in our dataset.

What bias do you observe?

  • Over-representation of chordates and specifically mammals (~5.4K species) versus arthropods (>1M species), mollusks (~100K species), echinodermata (10k), bryozoa (~5k)

Using the dplyr package to create a new column in our animalia df called is_chordate. Use the ifelse() function to label if from phylum (label as chordata) or not (label as nonchordata)

animalia <- animalia %>%
  mutate(is_chordate = ifelse(phylum == "Chordata", "chordata", "nonchordata"))

head(animalia)
  nr     phylum   class       order        family    bodysize    age richness
1  1 Arthropoda Insecta Hymenoptera    Cynipoidae 0.000000750  77.05     2500
2  2 Arthropoda Insecta Hymenoptera Ichneumonidae 0.000005790 142.85    60000
3  3 Arthropoda Insecta Hymenoptera    Braconidae 0.000000660 142.85    50000
4  4 Arthropoda Insecta Hymenoptera        Apidae 0.000150317  63.60    20000
5  5 Arthropoda Insecta Hymenoptera      Vespidae 0.000124380 142.85     1000
6  6 Arthropoda Insecta Hymenoptera    Formicidae 0.000001650 118.50    15000
  N_E refs.M..N_L..age...N_E       X is_chordate
1  -1                    1,3 1.00000 nonchordata
2  -1                    1,4 0.72968 nonchordata
3  -1                    1,3 0.72968 nonchordata
4  -1                    1,2 0.72968 nonchordata
5  -1                  1,3,5 0.72968 nonchordata
6  -1                    1,4 0.72968 nonchordata

Now repeat your log-transformed plot of body size and richness, but color by is_chordate

ggplot(animalia, aes(x = log10(bodysize), y = log10(richness), color = is_chordate)) +
  geom_point() +
  labs(title = "Animal family log richness vs log average body size",
       x = "Log10(Average Body Size)",
       y = "Log10(Species Richness)",
       color = "Phylum")

Does the plot with chordates and non-chordates distinguished change your interpretation of the association between body size and species richness? If so, how?

  • The negative relationship is now less clear; the chordates and non-chordates seem to cluster into two separate groups on the plot. The non-chordates all tend to have small family-level average body sizes and high family richness, and vice vera for chordates.

  • Also, the negative relationship between mean body size and species richness is much less clear (and almost disappears), especially in the Chordata (red points).

Subset for just mammals and plot log-transformed bodysize by log transformed richness using ggplot2.

Tip: you can combine a dplyr and ggplot() pipe together to avoid creating another new dataframe.

animalia %>%
  filter(class == 'Mammalia') %>%
  ggplot(aes(x = log10(bodysize), y = log10(richness))) +
    geom_point() +
    labs(title = "Log richness vs. bodysize of mammals")

Are there any noteworthy or consistent patterns of variation within this group?

  • The pattern is weaker: there is low richness and large average body sizes overall. The negative relationship between body size and species richness is less clear in Mammalia than in the earlier plot of all Animalia families.

To dive deeper into this question of the relationship between richness and bodysize, let’s look at individual species instead of family averages.

We will work with the panTHERIA dataset(Jones et al. 2009), a comprehensive data set of all known extant and recently extinct mammals, compiled from various literature sources.

Read in panTHERIA dataset provided to you "PanTHERIA_1-0_WR05_Aug2008.txt", assign to object pantheria, use the function dplyr::rename() to change the body mass column name to bodysize, and explore the dataset, especially regarding bodysize.

Some options to explore could include checking the range() of bodysize values, check for NA values with is.na(), check for non-numerical element is is.nan(), etc.

pantheria <- read.table("Lab_9_data/PanTHERIA_1-0_WR05_Aug2008.txt", header = TRUE, sep = "\t", as.is = TRUE) %>%
  rename(bodysize = X5.1_AdultBodyMass_g)

head(pantheria)
   MSW05_Order MSW05_Family MSW05_Genus MSW05_Species      MSW05_Binomial
1 Artiodactyla    Camelidae     Camelus   dromedarius Camelus dromedarius
2    Carnivora      Canidae       Canis       adustus       Canis adustus
3    Carnivora      Canidae       Canis        aureus        Canis aureus
4    Carnivora      Canidae       Canis       latrans       Canis latrans
5    Carnivora      Canidae       Canis         lupus         Canis lupus
6 Artiodactyla      Bovidae         Bos     frontalis       Bos frontalis
  X1.1_ActivityCycle  bodysize X8.1_AdultForearmLen_mm
1                  3 492714.47                    -999
2                  1  10392.49                    -999
3                  2   9658.70                    -999
4                  2  11989.10                    -999
5                  2  31756.51                    -999
6                  2 800143.05                    -999
  X13.1_AdultHeadBodyLen_mm X2.1_AgeatEyeOpening_d X3.1_AgeatFirstBirth_d
1                   -999.00                -999.00                1651.62
2                    745.32                -999.00                -999.00
3                    827.53                   7.50                -999.00
4                    872.39                  11.94                 365.00
5                   1055.00                  14.01                 547.50
6                   2700.00                -999.00                -999.00
  X18.1_BasalMetRate_mLO2hr X5.2_BasalMetRateMass_g X6.1_DietBreadth
1                   40293.0                  407000                3
2                    -999.0                    -999                6
3                    -999.0                    -999                6
4                    3699.0                   10450                1
5                   11254.2                   33100                1
6                    -999.0                    -999                3
  X7.1_DispersalAge_d X9.1_GestationLen_d X12.1_HabitatBreadth
1             -999.00              386.51                    1
2              329.99               65.00                    1
3             -999.00               61.24                    1
4              255.00               61.74                    1
5              180.00               63.50                    1
6             -999.00              273.75                 -999
  X22.1_HomeRange_km2 X22.2_HomeRange_Indiv_km2 X14.1_InterbirthInterval_d
1              196.32                   -999.00                     614.41
2                1.01                      1.01                    -999.00
3                2.95                      3.13                     365.00
4               18.88                     19.91                     365.00
5              159.86                     43.13                     365.00
6             -999.00                   -999.00                     403.02
  X15.1_LitterSize X16.1_LittersPerYear X17.1_MaxLongevity_m
1             0.98                    1                480.0
2             4.50                 -999                137.0
3             3.74                 -999                192.0
4             5.72                 -999                262.0
5             4.98                    2                354.0
6             1.22                    1                314.4
  X5.3_NeonateBodyMass_g X13.2_NeonateHeadBodyLen_mm
1               36751.19                        -999
2                -999.00                        -999
3                 211.82                        -999
4                 200.01                        -999
5                 412.31                        -999
6               22977.05                        -999
  X21.1_PopulationDensity_n.km2 X10.1_PopulationGrpSize
1                          0.98                      11
2                          0.74                    -999
3                          0.22                    -999
4                          0.25                    -999
5                          0.01                    -999
6                          0.54                    -999
  X23.1_SexualMaturityAge_d X10.2_SocialGrpSize X24.1_TeatNumber
1                   1947.94                  10             -999
2                    249.88                -999                8
3                    371.23                -999                8
4                    372.90                -999                8
5                    679.37                -999                9
6                    610.97                  40             -999
  X12.2_Terrestriality X6.2_TrophicLevel X25.1_WeaningAge_d
1                    1                 1             389.38
2                    1                 2              52.89
3                    1                 2              61.30
4                    1                 3              43.71
5                    1                 3              44.82
6                 -999                 1             186.67
  X5.4_WeaningBodyMass_g X13.3_WeaningHeadBodyLen_mm
1                   -999                        -999
2                   -999                        -999
3                   -999                        -999
4                   -999                        -999
5                   -999                        -999
6                   -999                        -999
                                               References
1     511;543;719;1274;1297;1594;1654;1822;1848;2655;3044
2                         542;543;730;1113;1297;1573;2655
3                         543;679;730;1113;1297;1573;2655
4                367;542;543;730;1113;1297;1573;1822;2655
5 367;542;543;730;1015;1052;1113;1297;1573;1594;2338;2655
6                511;543;730;899;1297;1525;1848;2655;2986
  X5.5_AdultBodyMass_g_EXT X16.2_LittersPerYear_EXT X5.6_NeonateBodyMass_g_EXT
1                     -999                   -999.0                       -999
2                     -999                   -999.0                       -999
3                     -999                      1.1                       -999
4                     -999                      1.1                       -999
5                     -999                   -999.0                       -999
6                     -999                   -999.0                       -999
  X5.7_WeaningBodyMass_g_EXT X26.1_GR_Area_km2 X26.2_GR_MaxLat_dd
1                       -999              -999            -999.00
2                       -999          10581413              16.72
3                       -999          25739527              47.00
4                       -999          17099094              71.39
5                       -999          50803440              83.27
6                       -999           2294332              28.01
  X26.3_GR_MinLat_dd X26.4_GR_MidRangeLat_dd X26.5_GR_MaxLong_dd
1            -999.00                 -999.00             -999.00
2             -28.73                   -6.00               43.55
3              -4.71                   21.14              108.54
4               8.02                   39.70              -67.07
5              11.48                   47.38              179.65
6               2.86                   15.43              108.68
  X26.6_GR_MinLong_dd X26.7_GR_MidRangeLong_dd X27.1_HuPopDen_Min_n.km2
1             -999.00                  -999.00                     -999
2              -17.53                    13.00                        0
3              -17.05                    45.74                        0
4             -168.12                  -117.60                        0
5             -171.84                     3.90                        0
6               73.55                    91.12                        1
  X27.2_HuPopDen_Mean_n.km2 X27.3_HuPopDen_5p_n.km2 X27.4_HuPopDen_Change
1                   -999.00                    -999               -999.00
2                     35.20                       1                  0.14
3                     79.29                       0                  0.10
4                     27.27                       0                  0.06
5                     37.87                       0                  0.04
6                    152.67                       8                  0.09
  X28.1_Precip_Mean_mm X28.2_Temp_Mean_01degC X30.1_AET_Mean_mm
1              -999.00                -999.00           -999.00
2                90.75                 236.51            922.90
3                44.61                 217.23            438.02
4                53.03                  58.18            503.02
5                34.79                   4.82            313.33
6               145.21                 229.71           1053.45
  X30.2_PET_Mean_mm
1           -999.00
2           1534.40
3           1358.98
4            728.37
5            561.11
6           1483.02
range(pantheria$bodysize)
[1]      -999 154321305
unique(is.na(pantheria$bodysize))
[1] FALSE
unique(is.nan(pantheria$bodysize))
[1] FALSE

What do you observe?

  • There are negative numbers in our dataset which does not make sense. Warning that these negative numbers disappeared with our log-transformed plot, so it is good that we did not rely on this plot alone.

  • It looks like the number “-999” was used a placeholder for entries that do not have a bodysize value instead of NA or a non-numeric value.

Let’s clean up our dataset by filtering out these values

pantheria <- pantheria %>%
  filter(bodysize != -999)
  
range(unique(pantheria$bodysize))
[1]         1.96 154321304.50

Plot a histogram of bodysize

ggplot(data = pantheria, aes(x = log10(bodysize))) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Does the body-size histogram for mammals offer evidence for a positive association between body size and species richness within this taxon? What features of the shape of the histogram might help us understand the body size/species richness association?

  • The histogram shows that there are many small mammal species. This is consistent with an association between high speciation rate and small body size.
  • The histogram is heavily skewed to the right (longer tail to the right), with a long upper tail (most species are small; there are a few really big species).
  • Howevever, interestingly, the very smallest mammals are not the most diverse. The first 2 size classes have lower species diversity than the next two size classes.

Does this trend hold for mammalian orders?

Plot a histogram of bodysize faceted by every order in our pantheria dataset. Set scales = "free" noting what happens to the y- and x-axes between plots when we change that parameter.

ggplot(data = pantheria, aes(x = log(bodysize))) +
  geom_histogram() +
  facet_wrap(~ MSW05_Order, scales = "free") 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

What do you observe?

  • Many orders are right-skewed in log space, with some exceptions like Artiodactyla, Diprotodontia, and Primates.

Side question: what is dangerous about setting our parameter to scales = "free"?

  • It can be deceiving because it does not allow us to readily compare values across plots

One hypothesis for the high amount of species with small body size is that small body size promotes more rapid diversification. If this is true, then we might expect to see a negative association if we plot mean body size of animal families against their rate of diversification.

A simple way to summarize the average diversification rate over the lifetime of a clade is to divide the log of extant species richness (N) by its estimated age (t).

For the animal dataset, we can plot \(\frac{log(N)}{t}\) against mean body size.

Calculate divrate For each family within the animalia df using the mutate() function.

animalia <- animalia %>%
  mutate(divrate = log(richness) / age)

head(animalia)
  nr     phylum   class       order        family    bodysize    age richness
1  1 Arthropoda Insecta Hymenoptera    Cynipoidae 0.000000750  77.05     2500
2  2 Arthropoda Insecta Hymenoptera Ichneumonidae 0.000005790 142.85    60000
3  3 Arthropoda Insecta Hymenoptera    Braconidae 0.000000660 142.85    50000
4  4 Arthropoda Insecta Hymenoptera        Apidae 0.000150317  63.60    20000
5  5 Arthropoda Insecta Hymenoptera      Vespidae 0.000124380 142.85     1000
6  6 Arthropoda Insecta Hymenoptera    Formicidae 0.000001650 118.50    15000
  N_E refs.M..N_L..age...N_E       X is_chordate    divrate
1  -1                    1,3 1.00000 nonchordata 0.10154505
2  -1                    1,4 0.72968 nonchordata 0.07701855
3  -1                    1,3 0.72968 nonchordata 0.07574224
4  -1                    1,2 0.72968 nonchordata 0.15571521
5  -1                  1,3,5 0.72968 nonchordata 0.04835670
6  -1                    1,4 0.72968 nonchordata 0.08114604

Let’s generate log-transformed plots body size vs. species richness and plot body size vs. diversification rate, coloring by Chordata vs. nonchordata. Assign these plots to p1 and p2.

(Note that we have already taken the log of richness to calculate divrate so we do not take the log of divrate)

p1 <- ggplot(animalia, aes(x = log10(bodysize), y = log10(richness), color = is_chordate)) +
  geom_point() +
  labs(title = "Animal family log richness vs log average body size",
       x = "Log10(Average Body Size)",
       y = "Log10(Species Richness)",
       color = "Phylum")

p2 <- ggplot(animalia, aes(x = log10(bodysize), y = divrate, color = is_chordate)) +
  geom_point() +
  labs(title = "Animal diversification vs log average body size",
       x = "Log10(Average Body Size)",
       y = "Diversification Rate",
       color = "Phylum")
Patchwork

The package patchwork (Pedersen 2022) allows you to combine multiple plots into a single plot using intuitive syntax:

Using patchwork display our plots p1 and p2 side by side for comparison

p1 / p2

On the basis of these plots, can you conclude that families of smaller species have diversified more rapidly? If not, what other explanations could there be for the difference in mean species richness between chordate and non-chordate families?

  • No, we cannot conclude that families of smaller species have diversified more rapidly. There is no strong relationship in the 2nd plot between body size and diversification rate.
  • Other explanations?
    • Small-bodied families are more diverse, but they also tend to be older so they have had more time to accumulate more species.
    • Perhaps Chordate families need more “room” because they are bigger, but also need a greater variety of habitat types than smaller animals. Smaller animals specialize on finer subdivisions of habitat, so a given area can support more niches for smaller species. This concept is referred to as the “fractal nature of habitat surfaces.” The speciation (or diversification) rates could be the same, but the standing diversity is just higher for the non-chordates.

Now that we have visualized a pattern relevant to the body size-diversification rate hypothesis, we will apply some simple statistical tests. There are various tests we could apply. Based on the plots we might conclude, for example, that we are justified in grouping families into chordates and non-chordates, and testing for a difference between these two groups in mean diversification rate of families.

First, create a df of just chordates and another of just non-chordates

chordates <- animalia %>%
  filter(is_chordate == "chordata")

nonchordates <- animalia %>%
  filter(is_chordate == "nonchordata")

Let’s plot mass vs. diversification rate and add a simple regression line using a linear model with geom_smooth().

ggplot(data = chordates, aes(x = log10(bodysize), y = divrate)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Chordates")
`geom_smooth()` using formula = 'y ~ x'

ggplot(data = nonchordates, aes(x = log10(bodysize), y = divrate)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)  +
  labs(title = "Not chordates")
`geom_smooth()` using formula = 'y ~ x'

After visually investigating this trend in the plot, let’s check the output of the linear model (note that the order of independent/dependent variables is different in this function than the one above)

summary(lm(chordates$divrate ~ log10(chordates$bodysize)))

Call:
lm(formula = chordates$divrate ~ log10(chordates$bodysize))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.07952 -0.04190 -0.01640  0.03435  0.22943 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)                0.079287   0.005226   15.17   <2e-16 ***
log10(chordates$bodysize) -0.001504   0.003132   -0.48    0.632    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05844 on 125 degrees of freedom
Multiple R-squared:  0.001842,  Adjusted R-squared:  -0.006143 
F-statistic: 0.2307 on 1 and 125 DF,  p-value: 0.6318
summary(lm(nonchordates$divrate ~ log10(nonchordates$bodysize)))

Call:
lm(formula = nonchordates$divrate ~ log10(nonchordates$bodysize))

Residuals:
      Min        1Q    Median        3Q       Max 
-0.051727 -0.018285 -0.004638  0.009594  0.136293 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   0.040810   0.010817   3.773 0.000338 ***
log10(nonchordates$bodysize) -0.005188   0.002500  -2.075 0.041699 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.03491 on 69 degrees of freedom
Multiple R-squared:  0.05875,   Adjusted R-squared:  0.0451 
F-statistic: 4.306 on 1 and 69 DF,  p-value: 0.0417
  • Residuals: The difference (distance) between the actual values and the regression line.

    • As a simple step, we can use these values to determine if there are outliers that we should consider eliminating if min or max values are much larger than the median.

    • We can also perform separate statistics on the residuals themselves as a best practice.

    • A lower “residual standard error (RSE)” overall indicates a better fit

  • Coefficients:

    • Intercept: The predicted value of the dependent variable when the independent variable is equal to zero

    • Slope: The estimated effect of the independent variable on the dependent variable

    • The standard error can be used to indicate the precision of these estimates

    • The “Pr(>|t|)” column gives us the p-value for the intercept and slope

  • R-squared:

    • A “multiple R-squared” value closer to 1 indicates a better fit.

    • The “adjusted R-squared” value can be negative because it is adjusted for the number of independent variables in the model.

With this information, does this test support the hypothesis that families of small mean body size have diversified faster than those of larger mean body size? What about for chordates vs. nonchordates?

  • Visually there is a small negative trend for body size vs. diversification rate for nonchordates.

  • The p-values and even spread of our outliers indicate that our results likely did not arise by chance.

  • However, the slope estimate is actually quite low (-0.005188) and the R2 is low, so there is a poor fit, so there is not a strong trend

  • However, the intercept appears to be a good predictor of the value of the dependent variable as the independent variable approaches zero.

Consider this: could it be that the relationship we see between richness vs. body size is because older families differ in average body size and species richness compared to younger families?

Explore this by plotting this relationship and generating a linear model summary.

ggplot(data = animalia, aes(x = log10(age), y = log10(richness))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)
`geom_smooth()` using formula = 'y ~ x'

summary(lm(log10(animalia$age) ~ log10(animalia$richness)))

Call:
lm(formula = log10(animalia$age) ~ log10(animalia$richness))

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7478 -0.1733  0.0166  0.1644  0.9492 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)               1.50847    0.03552  42.468   <2e-16 ***
log10(animalia$richness)  0.14589    0.01582   9.223   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2634 on 196 degrees of freedom
Multiple R-squared:  0.3027,    Adjusted R-squared:  0.2991 
F-statistic: 85.07 on 1 and 196 DF,  p-value: < 2.2e-16
ggplot(data = animalia, aes(x = log10(age), y = log10(bodysize))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)
`geom_smooth()` using formula = 'y ~ x'

summary(lm(log10(animalia$age) ~ log10(animalia$bodysize)))

Call:
lm(formula = log10(animalia$age) ~ log10(animalia$bodysize))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.90436 -0.15704 -0.00856  0.15726  0.73932 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)               1.689145   0.019547   86.41   <2e-16 ***
log10(animalia$bodysize) -0.075195   0.006706  -11.21   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2462 on 196 degrees of freedom
Multiple R-squared:  0.3908,    Adjusted R-squared:  0.3877 
F-statistic: 125.8 on 1 and 196 DF,  p-value: < 2.2e-16

What do you discover? Do the patterns make biological sense given what you know about how diversification rate is measured (number of species / clade age) and about the evolution of life on earth? Why or why not?

  • Based on the plots, it does indeed look like older families have higher species richness than younger families
    • Again, this could simply be because there was more time to accumulate species richness.
  • Older families also tend to have smaller body sizes than younger families
    • These patterns make sense in the context of evolutionary history. Life started out microscopically and body size generally increased over evolutionary time alongside complexity. Therefore the trends we observed in chordates vs. nonchordates may just be related to the age of these groups.

Similarly, we can test hypotheses for specific groups. Following the Social/Machiavellian Hypothesis, increased brain size and capacity observed in mammal species like Homo sapiens was selected for to optimize performance in larger social groups. This is one of many competing hypotheses. Weaning age is related to brain size as altricial species with extended ontogeny focus early life resources on brain development. Create a plot and generate a linear model summary to test this hypothesis.

primates <- pantheria %>%
  rename(group_size = "X10.1_PopulationGrpSize") %>%
  rename(weaning_age = "X25.1_WeaningAge_d") %>%
  filter(MSW05_Order == "Primates") %>%
  filter(group_size > 0, weaning_age > 0)

ggplot(data = primates, aes(x = log10(group_size), y = log10(weaning_age))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)
`geom_smooth()` using formula = 'y ~ x'

summary(lm(log10(primates$group_size) ~ log10(primates$weaning_age)))

Call:
lm(formula = log10(primates$group_size) ~ log10(primates$weaning_age))

Residuals:
    Min      1Q  Median      3Q     Max 
-1.5870 -0.2215  0.1008  0.2775  0.8745 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  -0.8864     0.4675  -1.896 0.064244 .  
log10(primates$weaning_age)   0.8144     0.1915   4.253 0.000102 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4716 on 46 degrees of freedom
Multiple R-squared:  0.2822,    Adjusted R-squared:  0.2666 
F-statistic: 18.09 on 1 and 46 DF,  p-value: 0.0001024

Do you find support for this hypothesis?

  • We find some support for the Social/Machiavellian Hypothesis with a statistically significant, positive relationship between group size and weaning age.

  • The large range of residuals may imply outliers should be investigated to see if there is support for their removal. This may improve the fit following the R-squared value.


Introduction to Phylogenetic Diversity Analyses

In this section, based on the online workshop by Kembel(Kembel 2024), we will explore phylogeny in the context of grassland plant biodiversity in Alberta, Canada. We will examine their functional traits and phylogenetic relationships. This analysis is based on the data described in Kembel and Cahill (2011), focusing on the independent evolution of leaf and root traits among temperate grassland plant communities.

Load the grassland.community.csv dataset as the community matrix comm with header = TRUE and row.names = 1. Do the same for the metadata, loading it into the metadata df. Explore the dataset and its metadata.

comm <- read.csv("Lab_9_data/grassland.community.csv", header = TRUE, row.names = 1)

head(comm)
        Antennaria_parvifolia Artemisia_cana Artemisia_frigida
mix-O-1                    10             10                50
mix-O-2                     0             10                50
mix-O-3                    20             20                30
mix-O-4                     0              0                 0
mix-O-5                     0             10                 0
mix-O-6                     0              0                60
        Symphyotrichum_ericoides_var._ericoides Bouteloua_gracilis
mix-O-1                                      10                 70
mix-O-2                                      10                 90
mix-O-3                                      10                 60
mix-O-4                                       0                 90
mix-O-5                                       0                100
mix-O-6                                       0                 70
        Carex_filifolia Elymus_lanceolatus Erysimum_inconspicuum
mix-O-1              80                 10                    20
mix-O-2              30                 70                     0
mix-O-3              60                 90                     0
mix-O-4              90                  0                     0
mix-O-5             100                 10                     0
mix-O-6             100                 50                     0
        Heterotheca_villosa Koeleria_macrantha Penstemon_nitidus Phlox_hoodii
mix-O-1                  10                 90                10           60
mix-O-2                   0                 30                 0           30
mix-O-3                   0                 20                 0            0
mix-O-4                   0                 40                 0            0
mix-O-5                   0                  0                 0            0
mix-O-6                   0                 40                 0           50
        Poa_secunda Hesperostipa_comata_ssp._comata Taraxacum_officinale
mix-O-1          80                              90                   10
mix-O-2          70                              10                    0
mix-O-3         100                              30                   20
mix-O-4           0                             100                    0
mix-O-5          30                              80                    0
mix-O-6          50                             100                   30
        Thermopsis_rhombifolia Achillea_millefolium Allium_textile
mix-O-1                     30                    0              0
mix-O-2                     10                   10             10
mix-O-3                      0                   30              0
mix-O-4                      0                    0              0
mix-O-5                      0                    0              0
mix-O-6                      0                    0              0
        Symphyotrichum_falcatum_var._falcatum Astragalus_bisulcatus
mix-O-1                                     0                     0
mix-O-2                                    20                    10
mix-O-3                                     0                     0
mix-O-4                                     0                     0
mix-O-5                                     0                     0
mix-O-6                                     0                     0
        Krascheninnikovia_lanata Gutierrezia_sarothrae Hymenoxys_richardsonii
mix-O-1                        0                     0                      0
mix-O-2                       20                    10                     10
mix-O-3                        0                     0                      0
mix-O-4                        0                     0                      0
mix-O-5                       30                     0                      0
mix-O-6                       10                     0                      0
        Musineon_divaricatum Plantago_patagonica Sphaeralcea_coccinea
mix-O-1                    0                   0                    0
mix-O-2                   20                  50                   40
mix-O-3                   20                  30                   30
mix-O-4                    0                   0                   20
mix-O-5                    0                   0                   20
mix-O-6                    0                  20                   50
        Nassella_viridula Androsace_occidentalis Arnica_fulgens
mix-O-1                 0                      0              0
mix-O-2                20                      0              0
mix-O-3                10                     20             30
mix-O-4                 0                      0              0
mix-O-5                 0                      0              0
mix-O-6                 0                      0              0
        Penstemon_confertus Vicia_americana Zigadenus_venenosus
mix-O-1                   0               0                   0
mix-O-2                   0               0                   0
mix-O-3                  10              20                  50
mix-O-4                   0               0                   0
mix-O-5                   0               0                   0
mix-O-6                   0               0                   0
        Tragopogon_dubius Machaeranthera_pinnatifida_var._pinnatifida
mix-O-1                 0                                           0
mix-O-2                 0                                           0
mix-O-3                 0                                           0
mix-O-4                10                                           0
mix-O-5                10                                          10
mix-O-6                 0                                           0
        Erigeron_pumilus Linum_lewisii Amelanchier_alnifolia
mix-O-1                0             0                     0
mix-O-2                0             0                     0
mix-O-3                0             0                     0
mix-O-4                0             0                     0
mix-O-5                0             0                     0
mix-O-6               20             0                     0
        Artemisia_ludoviciana Symphyotrichum_laeve_var._laeve
mix-O-1                     0                               0
mix-O-2                     0                               0
mix-O-3                     0                               0
mix-O-4                     0                               0
mix-O-5                     0                               0
mix-O-6                     0                               0
        Astragalus_australis Carex_obtusata Cerastium_arvense Chenopodium_album
mix-O-1                    0              0                 0                 0
mix-O-2                    0              0                 0                 0
mix-O-3                    0              0                 0                 0
mix-O-4                    0              0                 0                 0
mix-O-5                    0              0                 0                 0
mix-O-6                    0              0                 0                 0
        Comandra_umbellata Descurainia_sophia Draba_nemorosa Elymus_repens
mix-O-1                  0                  0              0             0
mix-O-2                  0                  0              0             0
mix-O-3                  0                  0              0             0
mix-O-4                  0                  0              0             0
mix-O-5                  0                  0              0             0
mix-O-6                  0                  0              0             0
        Festuca_hallii Galium_circaezans_var._circaezans Heuchera_richardsonii
mix-O-1              0                                 0                     0
mix-O-2              0                                 0                     0
mix-O-3              0                                 0                     0
mix-O-4              0                                 0                     0
mix-O-5              0                                 0                     0
mix-O-6              0                                 0                     0
        Lactuca_tatarica_var._pulchella Lepidium_densiflorum Oxytropis_sericea
mix-O-1                               0                    0                 0
mix-O-2                               0                    0                 0
mix-O-3                               0                    0                 0
mix-O-4                               0                    0                 0
mix-O-5                               0                    0                 0
mix-O-6                               0                    0                 0
        Poa_pratensis Rosa_arkansana Hesperostipa_spartea
mix-O-1             0              0                    0
mix-O-2             0              0                    0
mix-O-3             0              0                    0
mix-O-4             0              0                    0
mix-O-5             0              0                    0
mix-O-6             0              0                    0
        Symphoricarpos_occidentalis Viola_adunca Pulsatilla_patens
mix-O-1                           0            0                 0
mix-O-2                           0            0                 0
mix-O-3                           0            0                 0
mix-O-4                           0            0                 0
mix-O-5                           0            0                 0
mix-O-6                           0            0                 0
        Symphyotrichum_ciliolatum Astragalus_agrestis Solidago_missouriensis
mix-O-1                         0                   0                      0
mix-O-2                         0                   0                      0
mix-O-3                         0                   0                      0
mix-O-4                         0                   0                      0
mix-O-5                         0                   0                      0
mix-O-6                         0                   0                      0
        Arnica_sororia Danthonia_intermedia Erigeron_glabellus
mix-O-1              0                    0                  0
mix-O-2              0                    0                  0
mix-O-3              0                    0                  0
mix-O-4              0                    0                  0
mix-O-5              0                    0                  0
mix-O-6              0                    0                  0
        Potentilla_arguta Oligoneuron_rigidum_var._rigidum Dalea_purpurea
mix-O-1                 0                                0              0
mix-O-2                 0                                0              0
mix-O-3                 0                                0              0
mix-O-4                 0                                0              0
mix-O-5                 0                                0              0
mix-O-6                 0                                0              0
        Antennaria_neglecta Penstemon_procerus Poa_cusickii Bromus_inermis
mix-O-1                   0                  0            0              0
mix-O-2                   0                  0            0              0
mix-O-3                   0                  0            0              0
mix-O-4                   0                  0            0              0
mix-O-5                   0                  0            0              0
mix-O-6                   0                  0            0              0
        Campanula_rotundifolia Lathyrus_venosus Elaeagnus_commutata
mix-O-1                      0                0                   0
mix-O-2                      0                0                   0
mix-O-3                      0                0                   0
mix-O-4                      0                0                   0
mix-O-5                      0                0                   0
mix-O-6                      0                0                   0
        Orthocarpus_luteus
mix-O-1                  0
mix-O-2                  0
mix-O-3                  0
mix-O-4                  0
mix-O-5                  0
mix-O-6                  0
range(as.vector(as.matrix(comm)), na.rm = TRUE)
[1]   0 100
metadata <- read.csv("Lab_9_data/plot.metadata.csv", header = TRUE, row.names = 1)

head(metadata)
           habitat    site slope aspect slope.position rel.moisture
mix-O-1 Mixedgrass Onefour     0    270            3.0            1
mix-O-2 Mixedgrass Onefour    20    130            1.5            2
mix-O-3 Mixedgrass Onefour     5     90            1.0            2
mix-O-4 Mixedgrass Onefour     5     40            2.0            1
mix-O-5 Mixedgrass Onefour     5    130            2.0            1
mix-O-6 Mixedgrass Onefour     1     90            3.0            1

Describe the dataset.

  • Every cell indicates the percentage coverage of a particular species within a sample.

It’s important to note that several multivariate analytical methods, like vegan, are influenced by the overall abundance present in a sample. Therefore, it may be advisable to transform these absolute abundance measurements into relative abundance values. This adjustment can be efficiently carried out using a specific function available in the vegan package:

comm <- decostand(comm, method = "total")
apply(comm, 1, sum)
 mix-O-1  mix-O-2  mix-O-3  mix-O-4  mix-O-5  mix-O-6  mix-O-7  fes-K-8 
       1        1        1        1        1        1        1        1 
 fes-K-9 fes-K-10 fes-K-11 fes-K-12 fes-K-13 fes-K-14 fes-K-15 fes-K-16 
       1        1        1        1        1        1        1        1 
fes-K-17 mix-H-18 mix-H-19 mix-H-20 mix-H-21 mix-H-22 mix-H-23 mix-H-24 
       1        1        1        1        1        1        1        1 
mix-H-25 mix-H-26 mix-H-27 
       1        1        1 
head(comm)
        Antennaria_parvifolia Artemisia_cana Artemisia_frigida
mix-O-1            0.01562500     0.01562500        0.07812500
mix-O-2            0.00000000     0.01587302        0.07936508
mix-O-3            0.02816901     0.02816901        0.04225352
mix-O-4            0.00000000     0.00000000        0.00000000
mix-O-5            0.00000000     0.02500000        0.00000000
mix-O-6            0.00000000     0.00000000        0.09230769
        Symphyotrichum_ericoides_var._ericoides Bouteloua_gracilis
mix-O-1                              0.01562500         0.10937500
mix-O-2                              0.01587302         0.14285714
mix-O-3                              0.01408451         0.08450704
mix-O-4                              0.00000000         0.25714286
mix-O-5                              0.00000000         0.25000000
mix-O-6                              0.00000000         0.10769231
        Carex_filifolia Elymus_lanceolatus Erysimum_inconspicuum
mix-O-1      0.12500000         0.01562500               0.03125
mix-O-2      0.04761905         0.11111111               0.00000
mix-O-3      0.08450704         0.12676056               0.00000
mix-O-4      0.25714286         0.00000000               0.00000
mix-O-5      0.25000000         0.02500000               0.00000
mix-O-6      0.15384615         0.07692308               0.00000
        Heterotheca_villosa Koeleria_macrantha Penstemon_nitidus Phlox_hoodii
mix-O-1            0.015625         0.14062500          0.015625   0.09375000
mix-O-2            0.000000         0.04761905          0.000000   0.04761905
mix-O-3            0.000000         0.02816901          0.000000   0.00000000
mix-O-4            0.000000         0.11428571          0.000000   0.00000000
mix-O-5            0.000000         0.00000000          0.000000   0.00000000
mix-O-6            0.000000         0.06153846          0.000000   0.07692308
        Poa_secunda Hesperostipa_comata_ssp._comata Taraxacum_officinale
mix-O-1  0.12500000                      0.14062500           0.01562500
mix-O-2  0.11111111                      0.01587302           0.00000000
mix-O-3  0.14084507                      0.04225352           0.02816901
mix-O-4  0.00000000                      0.28571429           0.00000000
mix-O-5  0.07500000                      0.20000000           0.00000000
mix-O-6  0.07692308                      0.15384615           0.04615385
        Thermopsis_rhombifolia Achillea_millefolium Allium_textile
mix-O-1             0.04687500           0.00000000     0.00000000
mix-O-2             0.01587302           0.01587302     0.01587302
mix-O-3             0.00000000           0.04225352     0.00000000
mix-O-4             0.00000000           0.00000000     0.00000000
mix-O-5             0.00000000           0.00000000     0.00000000
mix-O-6             0.00000000           0.00000000     0.00000000
        Symphyotrichum_falcatum_var._falcatum Astragalus_bisulcatus
mix-O-1                            0.00000000            0.00000000
mix-O-2                            0.03174603            0.01587302
mix-O-3                            0.00000000            0.00000000
mix-O-4                            0.00000000            0.00000000
mix-O-5                            0.00000000            0.00000000
mix-O-6                            0.00000000            0.00000000
        Krascheninnikovia_lanata Gutierrezia_sarothrae Hymenoxys_richardsonii
mix-O-1               0.00000000            0.00000000             0.00000000
mix-O-2               0.03174603            0.01587302             0.01587302
mix-O-3               0.00000000            0.00000000             0.00000000
mix-O-4               0.00000000            0.00000000             0.00000000
mix-O-5               0.07500000            0.00000000             0.00000000
mix-O-6               0.01538462            0.00000000             0.00000000
        Musineon_divaricatum Plantago_patagonica Sphaeralcea_coccinea
mix-O-1           0.00000000          0.00000000           0.00000000
mix-O-2           0.03174603          0.07936508           0.06349206
mix-O-3           0.02816901          0.04225352           0.04225352
mix-O-4           0.00000000          0.00000000           0.05714286
mix-O-5           0.00000000          0.00000000           0.05000000
mix-O-6           0.00000000          0.03076923           0.07692308
        Nassella_viridula Androsace_occidentalis Arnica_fulgens
mix-O-1        0.00000000             0.00000000     0.00000000
mix-O-2        0.03174603             0.00000000     0.00000000
mix-O-3        0.01408451             0.02816901     0.04225352
mix-O-4        0.00000000             0.00000000     0.00000000
mix-O-5        0.00000000             0.00000000     0.00000000
mix-O-6        0.00000000             0.00000000     0.00000000
        Penstemon_confertus Vicia_americana Zigadenus_venenosus
mix-O-1          0.00000000      0.00000000          0.00000000
mix-O-2          0.00000000      0.00000000          0.00000000
mix-O-3          0.01408451      0.02816901          0.07042254
mix-O-4          0.00000000      0.00000000          0.00000000
mix-O-5          0.00000000      0.00000000          0.00000000
mix-O-6          0.00000000      0.00000000          0.00000000
        Tragopogon_dubius Machaeranthera_pinnatifida_var._pinnatifida
mix-O-1        0.00000000                                       0.000
mix-O-2        0.00000000                                       0.000
mix-O-3        0.00000000                                       0.000
mix-O-4        0.02857143                                       0.000
mix-O-5        0.02500000                                       0.025
mix-O-6        0.00000000                                       0.000
        Erigeron_pumilus Linum_lewisii Amelanchier_alnifolia
mix-O-1       0.00000000             0                     0
mix-O-2       0.00000000             0                     0
mix-O-3       0.00000000             0                     0
mix-O-4       0.00000000             0                     0
mix-O-5       0.00000000             0                     0
mix-O-6       0.03076923             0                     0
        Artemisia_ludoviciana Symphyotrichum_laeve_var._laeve
mix-O-1                     0                               0
mix-O-2                     0                               0
mix-O-3                     0                               0
mix-O-4                     0                               0
mix-O-5                     0                               0
mix-O-6                     0                               0
        Astragalus_australis Carex_obtusata Cerastium_arvense Chenopodium_album
mix-O-1                    0              0                 0                 0
mix-O-2                    0              0                 0                 0
mix-O-3                    0              0                 0                 0
mix-O-4                    0              0                 0                 0
mix-O-5                    0              0                 0                 0
mix-O-6                    0              0                 0                 0
        Comandra_umbellata Descurainia_sophia Draba_nemorosa Elymus_repens
mix-O-1                  0                  0              0             0
mix-O-2                  0                  0              0             0
mix-O-3                  0                  0              0             0
mix-O-4                  0                  0              0             0
mix-O-5                  0                  0              0             0
mix-O-6                  0                  0              0             0
        Festuca_hallii Galium_circaezans_var._circaezans Heuchera_richardsonii
mix-O-1              0                                 0                     0
mix-O-2              0                                 0                     0
mix-O-3              0                                 0                     0
mix-O-4              0                                 0                     0
mix-O-5              0                                 0                     0
mix-O-6              0                                 0                     0
        Lactuca_tatarica_var._pulchella Lepidium_densiflorum Oxytropis_sericea
mix-O-1                               0                    0                 0
mix-O-2                               0                    0                 0
mix-O-3                               0                    0                 0
mix-O-4                               0                    0                 0
mix-O-5                               0                    0                 0
mix-O-6                               0                    0                 0
        Poa_pratensis Rosa_arkansana Hesperostipa_spartea
mix-O-1             0              0                    0
mix-O-2             0              0                    0
mix-O-3             0              0                    0
mix-O-4             0              0                    0
mix-O-5             0              0                    0
mix-O-6             0              0                    0
        Symphoricarpos_occidentalis Viola_adunca Pulsatilla_patens
mix-O-1                           0            0                 0
mix-O-2                           0            0                 0
mix-O-3                           0            0                 0
mix-O-4                           0            0                 0
mix-O-5                           0            0                 0
mix-O-6                           0            0                 0
        Symphyotrichum_ciliolatum Astragalus_agrestis Solidago_missouriensis
mix-O-1                         0                   0                      0
mix-O-2                         0                   0                      0
mix-O-3                         0                   0                      0
mix-O-4                         0                   0                      0
mix-O-5                         0                   0                      0
mix-O-6                         0                   0                      0
        Arnica_sororia Danthonia_intermedia Erigeron_glabellus
mix-O-1              0                    0                  0
mix-O-2              0                    0                  0
mix-O-3              0                    0                  0
mix-O-4              0                    0                  0
mix-O-5              0                    0                  0
mix-O-6              0                    0                  0
        Potentilla_arguta Oligoneuron_rigidum_var._rigidum Dalea_purpurea
mix-O-1                 0                                0              0
mix-O-2                 0                                0              0
mix-O-3                 0                                0              0
mix-O-4                 0                                0              0
mix-O-5                 0                                0              0
mix-O-6                 0                                0              0
        Antennaria_neglecta Penstemon_procerus Poa_cusickii Bromus_inermis
mix-O-1                   0                  0            0              0
mix-O-2                   0                  0            0              0
mix-O-3                   0                  0            0              0
mix-O-4                   0                  0            0              0
mix-O-5                   0                  0            0              0
mix-O-6                   0                  0            0              0
        Campanula_rotundifolia Lathyrus_venosus Elaeagnus_commutata
mix-O-1                      0                0                   0
mix-O-2                      0                0                   0
mix-O-3                      0                0                   0
mix-O-4                      0                0                   0
mix-O-5                      0                0                   0
mix-O-6                      0                0                   0
        Orthocarpus_luteus
mix-O-1                  0
mix-O-2                  0
mix-O-3                  0
mix-O-4                  0
mix-O-5                  0
mix-O-6                  0

Trait data include information about the leaf and root traits of each species, which are crucial for understanding functional diversity within the community. We’ll load a phylogeny for these species as well, which helps in understanding the evolutionary relationships and phylogenetic diversity within the community. Load in trait data as the df traits from the species.traits.csv file and the phylogenetic data as the df phy from the Newick file grassland.phylogeny.newick

Understanding Phylogenetic Dataset Formats

Phylogenetic datasets come in various formats, each with its specific structure and use case. Familiarizing yourself with these formats is useful for efficiently managing and analyzing phylogenetic data. Here are some of the most common types:

  • Newick: A compact way to represent phylogenetic trees using parentheses and commas. Despite its simplicity, Newick format captures the tree structure, including branch lengths. It doesn’t support metadata like taxa descriptions or geographical information.

  • NEXUS: A more versatile format than Newick, NEXUS files can contain multiple blocks of data including trees, sequences, sets of characters, and assumptions about the data. It’s widely used in software packages for evolutionary analysis because of its flexibility and the ability to include rich metadata.

.tre: Often just a file extension for trees saved in either Newick or NEXUS formats. By itself, “.tre” doesn’t specify a format but is commonly used to denote phylogenetic trees.

– The commonly used Newick or Nexus format files can be imported into R with the read.tree or read.nexus functions

  • Phylip: Originally designed for the PHYLIP software package, this format is primarily used for sequence data but can also represent phylogenetic trees. It has strict formatting rules for sequence names and alignment, which can be a limitation.

  • FASTA: While primarily known for representing sequences, FASTA files are sometimes used in phylogenetic analysis to supply the sequence data from which trees are inferred. FASTA files are simple, with each sequence prefaced by a ‘>’ symbol followed by an identifier line.

Each format has its advantages and is supported by different software tools for phylogenetic analysis.

traits <- read.csv("Lab_9_data/species.traits.csv", header = TRUE, row.names = 1)

head(traits)
                            SLA  LeafArea LeafThickness       SLV
Achillea_millefolium   140.2663  9.275390     0.4163333  59.56525
Allium_textile         137.7006  2.445361     0.9147222 125.69496
Amelanchier_alnifolia  156.1014 14.064856     0.2900000  45.45227
Androsace_occidentalis 257.2050  0.274745     0.2535000  84.22189
Antennaria_neglecta    171.0442  1.731990     0.2810000  48.14442
Antennaria_parvifolia  193.8718  0.317200     0.2466667  47.64151
                       LeafTissueDens       SRL       SRV RootTissueDens
Achillea_millefolium      0.018085428  74.14570  5.038776      0.2553510
Allium_textile            0.008137136 187.85485 14.013757      0.1049986
Amelanchier_alnifolia     0.022841232  20.87560  2.518939      0.5039683
Androsace_occidentalis    0.017706402 207.45582  3.291592      0.4071078
Antennaria_neglecta       0.020920233 124.73397  6.710526      0.1593750
Antennaria_parvifolia     0.021048340  44.93859  4.003997      0.2504167
                        RootDiam
Achillea_millefolium   0.3123600
Allium_textile         0.3107833
Amelanchier_alnifolia  0.3760667
Androsace_occidentalis 0.1148714
Antennaria_neglecta    0.2749500
Antennaria_parvifolia  0.3495500
phy <- read.tree("Lab_9_data/grassland.phylogeny.newick")

Check the class of our phy dataframe:

class(phy)
[1] "phylo"

Our phylogenetic data structure is encapsulated in a phylo object type. Comprehensive documentation on the phylo format is accessible on the ape package’s official site (ape homepage). Intrinsically, a phylo object acts as a specialized list within R, incorporating various attributes such as labels for tips and the lengths of branches. This structured format, as delineated by the ape package, enables R to adeptly perform summarization and generate visual representations of phylogenetic trees.

Let’s explore our df further

names(phy)
[1] "edge"        "edge.length" "Nnode"       "node.label"  "tip.label"  
[6] "root.edge"  

Here is a reminder what these terms refer to:

We can check the first few tips:

phy$tip.label[1:5]
[1] "Antennaria_neglecta"   "Antennaria_parvifolia" "Erigeron_glabellus"   
[4] "Erigeron_pumilus"      "Heterotheca_villosa"  

How many tips does our phylogeny have?

Ntip(phy)
[1] 76

Now let’s plot our phylogeny:

plot(phy, cex = 0.5)

Among other packages, the picante package is commonly used for phylogenetic biodiversity analyses. The function match.phylo.comm processes both a community matrix and a phylogenetic tree (phylo object), identifying species discrepancies between the two datasets. It then realigns both datasets to ensure species order consistency and mutual presence, yielding modified versions of each dataset that share identical species compositions.

combined <- match.phylo.comm(phy, comm)

phy <- combined$phy
comm <- combined$comm

It’s also essential to verify that the arrangement of our community dataset aligns with that of our metadata.

all.equal(rownames(comm), rownames(metadata))
[1] TRUE

If they didn’t match, we could have used our metadata to match the orders:


metadata <- metadata[rownames(comm), ]

Now visualize species richness (alpha diversity) between metadata habitats as a boxplot. The habitat names are in the metadata df while the counts are in our comm community data matrix. Hint: Do you remember the species richness function from vegan?

ggplot(metadata, aes(x = habitat, y = specnumber(comm))) +
  geom_boxplot() +
  labs(x = "Habitat", y = "Species Richness", title = "Species richness by habitat")

What observations do you make?

  • There is higher alpha diversity in fescue habitats

Phylogenetic Diversity

One of the first measures of phylogenetic relatedness within ecological communities was the phylogenetic diversity (PD) index. Faith’s PD(Faith 1992) is defined as the total branch length of a tree including all local community species, optionally also including the root node. The pd() function returns Faith’s PD and species richness (SR).

comm.pd <- pd(comm, phy)
head(comm.pd)
               PD SR
mix-O-1 1072.3697 16
mix-O-2 1475.4767 22
mix-O-3 1406.1708 21
mix-O-4  564.5899  6
mix-O-5  783.4028 10
mix-O-6 1028.5796 13

Visualize these results with a boxplot

ggplot(metadata, aes(x = habitat, y = comm.pd$PD)) +
  geom_boxplot() +
  labs(x = "Habitat", y = "Faith's Phylogenetic Diversity (PD)")

Hierarchical Clustering

Like we did in our biodiversity metrics lab, let’s create a distance matrix comm.bc.dist using the vegdist() function. Since the Bray-Curtis distance incorporates species abundances, we should use abundance information when calculating our distances from our comm community data matrix.

comm.bc.dist <- vegdist(comm, method = "bray")

Similar to a phylogeny tree structure, we can plot these results as a dendrogram. We will then cluster together communities that are similar using an agglomerative hierarchical clustering algorithm:

comm.bc.clust <- hclust(comm.bc.dist, method = "average")
plot(comm.bc.clust, main = "Hierarchical Clustering of Communities")

What do you observe?

  • It looks like mixedgrass and fescue habitats contain different plant communities
  • The two main clusters separate fescue samples from the other samples

We will use these distance matrices we produced later

Phylogenetic Beta-Diversity

To assess community dissimilarity in a phylogenetic context, we employ metrics such as UniFrac and PhyloSor for phylogenetic beta-diversity, alongside comdist and comdistnt for quantifying mean pairwise distance (MPD) and mean nearest taxon distance (MNTD) between communities. These metrics illuminate the evolutionary and functional relationships driving community assembly across landscapes.

phy.dist <- cophenetic(phy)

comm.mntd.dist <- comdistnt(comm, phy.dist, abundance.weighted = TRUE)
mantel(comm.bc.dist, comm.mntd.dist)

Mantel statistic based on Pearson's product-moment correlation 

Call:
mantel(xdis = comm.bc.dist, ydis = comm.mntd.dist) 

Mantel statistic r: 0.8597 
      Significance: 0.001 

Upper quantiles of permutations (null model):
   90%    95%  97.5%    99% 
0.0788 0.1080 0.1284 0.1602 
Permutation: free
Number of permutations: 999
trait.dist <- as.matrix(dist(scale(traits), method = "euclidean"))

comm.mntd.traits.dist <- comdistnt(comm, trait.dist, abundance.weighted = TRUE)
mantel(comm.bc.dist, comm.mntd.traits.dist)

Mantel statistic based on Pearson's product-moment correlation 

Call:
mantel(xdis = comm.bc.dist, ydis = comm.mntd.traits.dist) 

Mantel statistic r: 0.9499 
      Significance: 0.001 

Upper quantiles of permutations (null model):
  90%   95% 97.5%   99% 
0.082 0.112 0.152 0.199 
Permutation: free
Number of permutations: 999

What do you observe?

  • The phylogenetic MNTD and community composition show a strong positive correlation (Mantel statistic r = 0.8597), indicating that communities with similar species compositions tend to be more closely related phylogenetically. This strong association suggests that phylogenetic relatedness is a key factor in community assembly.

  • The significant correlations (p < 0.001) in both tests underline the importance of evolutionary histories and functional adaptations in the assembly of ecological communities. These patterns reflect the influence of both phylogenetic lineage and functional traits on the ecological strategies and environmental adaptations of species within communities.

  • The correlation between functional trait MNTD and community composition is even stronger (Mantel statistic r = 0.9499), emphasizing the crucial role of functional traits in structuring communities. This implies that species within similar communities not only share a closer phylogenetic relationship but also possess similar functional traits.

These findings highlight the complex interplay between phylogenetic relatedness and functional trait similarities in influencing community structure and suggest a convergence in ecological strategies among closely related and functionally similar species. Big picture, traits in this case are a good proxy for phylogeny and vice versa.

Phylogeny/Trait-Based Ordinations

Remember that ordination methods, such as Non-metric Multidimensional Scaling (NMDS) that we used in the biodiversity metrics lab, provide a powerful tool for visualizing complex patterns of similarity and dissimilarity among communities. These methods can be applied to phylogenetic metrics as well. By ordinating communities, we can discern underlying gradients of environmental variation, biotic interactions, or historical contingency shaping community structure.

Like in the biodiversity metrics lab, we will use the metaMDS() function from the vegan package

comm.mntd.mds <- metaMDS(comm.mntd.dist)
Run 0 stress 0.116569 
Run 1 stress 0.116569 
... Procrustes: rmse 5.247208e-06  max resid 1.302183e-05 
... Similar to previous best
Run 2 stress 0.116569 
... Procrustes: rmse 5.51446e-06  max resid 1.326615e-05 
... Similar to previous best
Run 3 stress 0.133858 
Run 4 stress 0.1355939 
Run 5 stress 0.1166597 
... Procrustes: rmse 0.0532475  max resid 0.2560234 
Run 6 stress 0.1171039 
Run 7 stress 0.116569 
... Procrustes: rmse 4.690297e-06  max resid 1.212965e-05 
... Similar to previous best
Run 8 stress 0.1189597 
Run 9 stress 0.1166597 
... Procrustes: rmse 0.05324795  max resid 0.2560256 
Run 10 stress 0.1431935 
Run 11 stress 0.1290987 
Run 12 stress 0.1278436 
Run 13 stress 0.116569 
... Procrustes: rmse 4.525123e-06  max resid 1.578564e-05 
... Similar to previous best
Run 14 stress 0.1171553 
Run 15 stress 0.1269754 
Run 16 stress 0.116569 
... New best solution
... Procrustes: rmse 2.182259e-06  max resid 4.392028e-06 
... Similar to previous best
Run 17 stress 0.1168237 
... Procrustes: rmse 0.05400601  max resid 0.2564189 
Run 18 stress 0.1278301 
Run 19 stress 0.1278301 
Run 20 stress 0.1269811 
*** Best solution repeated 1 times
ggplot(as.data.frame(scores(comm.mntd.mds)), aes(NMDS1, NMDS2, color = metadata$habitat)) +
  geom_point() +
  labs(title = "NMDS Ordination based on Phylogenetic Distances")

What do you observe?

Visualizing trait evolution

Because we have phylogenetic data, we can also visualize evolution of traits from our dataset. Let’s visualize the evolution of leaf area:

plot(phy, direction = "up", show.tip.label = FALSE, show.node.label = TRUE, cex = 0.7)
tiplabels(pch = 19,  cex = 3 * (traits[, "LeafArea"]/max(traits[, "LeafArea"])))

What do you observe?

The sections presented above offer a very introductory exploration into the integration of phylogenetic perspectives within biodiversity studies. By incorporating phylogenetic diversity, functional trait analyses, and beta-diversity metrics, we begin to appreciate the complexity and depth of ecological relationships and evolutionary histories that underpin community assembly and species distribution. These integrations not only enrich our understanding of biodiversity patterns but also enhance the precision of conservation strategies, highlighting areas of evolutionary significance and functional diversity critical for ecosystem resilience and stability.


References

Etienne, Rampal S, Sara N de Visser, Thijs Janzen, Jeanine L Olsen, Han Olff, and James Rosindell. 2012. “Can Clade Age Alone Explain the Relationship Between Body Size and Diversity?” Interface Focus 2 (2): 170–79.
Faith, Daniel P. 1992. “Conservation Evaluation and Phylogenetic Diversity.” Biological Conservation 61 (1): 1–10.
Harmon, Luke. 2019. “Phylogenetic Comparative Methods: Learning from Trees.” https://lukejharmon.github.io/pcm/pdf/phylogeneticComparativeMethods.pdf.
Jones, Kate E, Jon Bielby, Marcel Cardillo, Susanne A Fritz, Justin O’Dell, C David L Orme, Kamran Safi, et al. 2009. “PanTHERIA: A Species-Level Database of Life History, Ecology, and Geography of Extant and Recently Extinct Mammals: Ecological Archives E090-184.” Ecology 90 (9): 2648–48.
Kembel, Steven. 2024. R for Biodiversity Analysis Workshop.” Université du Québec à Montréal (UQÀM); https://kembellab.ca/r-workshop/.
Mittelbach, Gary G, and Brian J McGill. 2019. Community Ecology. Oxford University Press.
Pedersen, Thomas Lin. 2022. “Patchwork: The Composer of Plots.” https://CRAN.R-project.org/package=patchwork.
Stevens, Martin Henry Hoffman. 2009. A Primer of Ecology with r. Springer.