Models

Models are an abstraction of reality. This means that models exclude much more information than they include. That is a limitation we must keep in mind when drawing conclusions from models, but this is also a very desirable attribute of models.

Imagine you were on a hiking trail and wanted to find the quickest way back to the trailhead. You see a sign-post with a map posted, so you approach it. If that map contained all of the information about every aspect of that area, that would be very unhelpful. Instead the map is designed as an abstraction, presenting only the information the users desire to know about the real world.

In general across fields, models are designed to provide at least one of three utilities to their users. The ability to:

Predict
Explain
Generalize

Which of these three utilities is the trail map designed to provide?

1) Exponential/Density-Independent Population Growth

1a) Discrete Exponential Population Growth

We will begin our labwork by exploring density-independent population growth. Density-independent population growth occurs when the growth rate of a population size is not affected by the number of conspecifics in that population.

Density-independent population growth is relatively simple to model because it can be represented by a simple geometric series and generally involves fewer parameters than other models of population.

However, it is also biologically useful to understand because density-independent population growth does occur in nature.

When in the natural world might you observe density-independent population growth?

When a population:

Is below carrying capacity
Has access to unlimited resources, like in a lab setting
Is experiencing release from anthropogenic pressure
Is experiencing predator release:
- Via removal of the predators/parasites from the population’s range
- Enemy release and the founder effect resulting from the population invading another area free from natural predators/parasites
Etc.

Modeling: The Trouble with Tribbles

Let’s model a scenario of density-independent population growth together. In Star Trek the hairy, featureless species called tribbles seem to have an extraordinary population growth rate. They are said to be “born pregnant.” It seems as if their population growth might be density-independent, but let us make a model to see if we think that is true.

*Star Trek: The Original Series episode, “The Trouble with Tribbles”, Fair use, Wikimedia*

We decide on a transect and monitor the tribbles on the ship. Between generations we observe the following:

Day	Number of individuals
1967-12-29	17
1967-12-30	51
1967-12-31	153
1968-01-01	459
1968-01-02	1377

Create vectors from number of tribbles (n) per day (days):

days = c(1:5)
n = c(17, 51, 153, 459, 1377)

Let’s plot the number of tribbles (n) per day (days) using ggplot2(Wickham 2016):

library(ggplot2)

Warning: package 'ggplot2' was built under R version 4.3.2

tribble_df <- data.frame(
  Days = days,
  N = n
)

ggplot(tribble_df, aes(x = Days, y = N)) +
  geom_step() +  # This creates the stepped line which shows discrete growth
  geom_point() +  # This adds points
  labs(x = "Days", y = "Number of Tribbles") # Labels for axes

Predicting population size

How can we predict how large we expect the population to be tomorrow? We need to calculate following:

\[ \begin{equation*}\frac{N_{t+1}}{N_{t}}\end{equation*} \]

ratio = n[2:5]/n[1:4]
ratio

[1] 3 3 3 3

Now we can make a generalization, the population seems to be increasing by how many percent per day?

\[ 200\% \] \[ 3n\ =\ n\ +\ 2n \]

The engineers have estimated that the “Aeroponics Bay” can only provide enough oxygen to support 111,537 tribbles. We need to predict how many days until this population size is reached if this growth rate continues.

Using n[5] calculate how many tribbles you predict there will be on day 6 (n_day_6)

n[5]*3

[1] 4131

Using n[5]*3, calculate how many tribbles you predict there will be on day 7

(n[5]*3)*3

[1] 12393

Do you see a trend? We can generalize this trend in the following way:

\[ \begin{array}{l}N_{6} \ =\ 3 N_{5}\\N_{7} \ =\ 3\ ( 3 N_{5})\\N_{8} \ =\ 3\ ( 3\ ( 3 N_{5}))\\\\= >\\\\N_{t} \ =\ 3^{t} N_{0}\end{array} \]

As a general term, we can introduce \(\lambda\) as the finite rate of increase. Note that \(\lambda\) has no unit.

\[ N_{t} \ =\ N_{0} \lambda ^{t} \]

Calculate how many more days you predict until the population exceeds 111,537 tribbles. You can use the following format:

Nt.s <- sapply(N0, function(n) n * lambda^t)

N0 <- 1377
lambda <- 3
t <- 1:5

Nt.s <- sapply(N0, function(n) n * lambda^t)
Nt.s

       [,1]
[1,]   4131
[2,]  12393
[3,]  37179
[4,] 111537
[5,] 334611

apply() Family Shorthand Notation

apply(x, function(y) y) is referred to as “shorthand notation” or “shorthand syntax” because we are using a placeholder function without even naming it, so it is referred to as an “anonymous function,” and then using shorthand to define what the function does. Meanwhile x is the argument and y is the parameter.

In result, how many more days until you predict the ship will run out of oxygen?

Only 4 days due to the compounding nature of density-independent growth.

The logarithmic scale is useful when analyzing rates. In this example, the logarithmic scale provides a clearer way to visualize this steady trend that may not have been immediately apparent in our earlier plot.

tribble_df_discrete <- data.frame(
  Days = t,
  N = Nt.s
)

ggplot(tribble_df_discrete , aes(x = Days, y = N)) +
  geom_step() + 
  geom_point() +  
  labs(x = "Days", y = "Population Size")

ggplot(tribble_df_discrete , aes(x = Days, y = N)) +
  geom_step() + 
  geom_point() +  
  labs(x = "Days", y = "Population Size") +
  scale_y_log10()  # Log transform the y-axis

Effects of initial population size

From our previous analysis, it looks like the ship is in danger. How much of an effect would it have to cull the current population?

Use the following values to determine how much would the population need to be culled today to extend oxygen reserves another full day:

N0.s <- c(N0, 918, 459, 230)

N0.s <- c(N0, 918, 459, 230)
t <- 1:5

Nt.s <- sapply(N0.s, function(n) n * lambda^t)
Nt.s

       [,1]   [,2]   [,3]  [,4]
[1,]   4131   2754   1377   690
[2,]  12393   8262   4131  2070
[3,]  37179  24786  12393  6210
[4,] 111537  74358  37179 18630
[5,] 334611 223074 111537 55890

Initial population size can have a large effect on populations undergoing density-independent growth!

Effects of differing per capita growth rates

So far, we have modeled this scenario with a per capita growth rate of 3. What do you think would happen if the per capita growth rate approached 1? What if it went below 1?

Now, let’s check your guess by plotting the scenarios where \(\lambda\) = c(0.75, 1, 1.5) for N0 over the next 5 days.

lambdas <- c(0.75, 1, 1.5)

Nt.s <- sapply(lambdas, function(lambda) N0 * lambda^t)

Nt.df <- data.frame( # We need to convert to a dataframe to work with `ggplot2`
  Days = rep(t, times = length(lambdas)),
  N = as.vector(Nt.s),
  Lambda = rep(lambdas, each = length(t))
)


ggplot(Nt.df, aes(x = Days, y = N, group = Lambda, color = factor(Lambda))) +
  geom_step() +
  geom_point() +
  geom_hline(yintercept = N0, linetype = "dashed") +  # Horizontal line at initial N value
  labs(x = "Days", y = "N", color = "Lambda")

Observe the plot above. How do the different values of \(\lambda\) affect the population size

\[ \lambda \ > \ 1\ \Longrightarrow \ Positive\ population\ growth\\ \lambda \ =\ 1\ \Longrightarrow \ Population\ size\ constant\\ \lambda \ < \ 1\ \Longrightarrow \ Population\ decline \]

1b) Continuous Exponential Population Growth

In nature, discrete logistic population growth is not as common as continuous logistic population growth. An example of discrete logistic population growth is the growth of a population of annual plants, whereas an example of continuous logistic population growth is the growth of a human population where reproduction occurs continuously.

In general as a convention, \(\lambda\) is used for discrete population growth and \(r\) is used for continuous population growth.

	Exponential/Density Independent	Logistic/Density Dependent
Discrete	Population grows by a constant proportion λ at each time step (e.g. year). \[ N^{t} \lambda \]	Population grows by a constant proportion each time step, but the growth rate decreases as the population size approaches the carrying capacity K. \[ \lambda N(1-\frac{N_{t}}{K}) \]
Continuous	Population grows continuously at an instantaneous rate r at each time step (e.g. year). \[ Ne^{rt} \]	Population grows continuously (not at discrete time steps), but the growth rate decreases as the population size approaches the carrying capacity K. \[ rN(1-\frac{N}{K}) \]

Discrete

Population grows by a constant proportion λ at each time step (e.g. year).

\[ N^{t} \lambda \]

Population grows by a constant proportion each time step, but the growth rate decreases as the population size approaches the carrying capacity K.

\[ \lambda N(1-\frac{N_{t}}{K}) \]

Continuous

Population grows continuously at an instantaneous rate r at each time step (e.g. year).

\[ Ne^{rt} \]

Population grows continuously (not at discrete time steps), but the growth rate decreases as the population size approaches the carrying capacity K.

\[ rN(1-\frac{N}{K}) \]

Let’s start with plotting discrete exponential population growth again in the same way we did above

N0 <- 1377
lambda <- 3
t <- 1:5

# Create equation for discrete population growth
Nt_discrete <- sapply(N0, function(n) n * lambda^t)


df_discrete <- data.frame(Days = t, Population = Nt_discrete)

ggplot(df_discrete, aes(x = Days, y = Population)) +
  geom_step() +
  geom_point() +
  labs(title = "Discrete Population Growth", x = "Days", y = "Population Size")

What happens when we make the spacing between days smaller? Try it with `t <- seq(1, 5, by = 0.25)`

N0 <- 1377
lambda <- 3
t <- seq(1, 5, by = 0.25)

# Create equation for discrete population growth
Nt_discrete <- sapply(N0, function(n) n * lambda^t)


df_discrete <- data.frame(Days = t, Population = Nt_discrete)

ggplot(df_discrete, aes(x = Days, y = Population)) +
  geom_step() +
  geom_point() +
  labs(title = "Discrete Population Growth", x = "Days", y = "Population Size")

How about even smaller?

N0 <- 1377
lambda <- 3
t <- seq(1, 5, by = 0.01)

# Create equation for discrete population growth
Nt_discrete <- sapply(N0, function(n) n * lambda^t)


df_discrete <- data.frame(Days = t, Population = Nt_discrete)

ggplot(df_discrete, aes(x = Days, y = Population)) +
  geom_step() +
  geom_point() +
  labs(title = "Discrete Population Growth", x = "Days", y = "Population Size")

We are approaching a continuous growth pattern.

We can model continuous exponential population growth directly with the formula:

\[ Ne^{rt} \]

Re-write our code with this equation along with \(r=log(\lambda)\) and geom_line() instead of geom_step().

# Define initial parameters
N0 <- 1377
lambda <- 3
r <- log(lambda)
t <- seq(1, 5, by = 0.01)

# Create equation for continuous population growth
Nt_continuous <- N0 * exp(r * t)

df_continuous <- data.frame(Days = t, Population = Nt_continuous)

ggplot(df_continuous, aes(x = Days, y = Population)) +
  geom_line() +
  labs(title = "Population Size", x = "Days", y = "Population Size")

Compare this to the discrete population growth plot above with small intervals between timesteps. What do you notice?

These curves are very similar!

We found earlier that \(\lambda = 1\) is the threshold between positive and negative population growth. What is the equivalent threshold for \(r\)?

lambda <- 1
r <- log(lambda)
r

[1] 0

\[ \begin{array}{l} No\ Change:\ \lambda \ =\ 1\ ,\ r\ =\ 0\\ Population\ Growth:\ \lambda \ >\ 1\ ,\ r\ >\ 0\\ Population\ Decline:\ \lambda \ < \ 1\ ,\ r\ < \ 0 \end{array} \]

2) Logistic/Density-Dependent Population Growth

2a) Continuous Logistic Population Growth

As we collect more data on the tribble population, we find that it does not match our original predictions following density-independent growth calculations. Instead, let’s explore if there may be something limiting population growth.

Why might this be?

Limited space
Limited resources (food, water, etc.)
Limited necessary habitat
Etc.
Perhaps limited by their own abundance: density-dependent

Recall from lecture, that we can represent density-dependent growth as

\[ \frac{dN}{dt} \ =\ rN\left( 1\ -\ \frac{N}{K}\right) \]

\(r\) is defined as \(r=ln(\lambda)\)

\(K\) is the carrying capacity of the population. Do you remember from lecture at which values of \(K\) is the population considered stable?

\[ \begin{array}{l}r\ =\ 0\\K\ =\ N\end{array} \]

There can be multiple definitions of stability for a population including:

Resilience - Long term rate of return towards equilibrium
Resistance - Tendency to resist change in population size
Reactivity - Rate of change immediately following disturbance
Permanence - Degree to which a population size does not experience boundary equilibria

Today, we will focus on resilience, the ability of a population size to return to an equilibrium when a disturbance occurs. We can monitor the strength of this attraction back to equilibrium by measuring the rate of attraction. This point of attraction, or the attractor, is the carrying capacity.

There is another equilibrium point at \(N = 0\). When the ship was devoid of all tribbles, the population was at a stable size of 0. However, this is considered the unstable equilibrium point, or repeller, because as soon as this population size was perturbed by the introduction of tribbles, the population size was rapidly repelled away from the unstable equilibrium point and attracted towards the stable equilibrium point of the population carrying capacity.

Now let’s add a carrying capacity to our model. We can do this by adding a term to the equation that represents the carrying capacity, \(K\) using the following equation:

\[ rN(1-\frac{N}{K}) \]

Add a carrying capacity of 1e4 to our model. Based on the plot above what would you expect to happen at that value?

# Define initial parameters
N0 <- 1377  # Initial population
r <- log(3)  # Intrinsic growth rate
K <- 20000  # Carrying capacity
t <- seq(1, 5, by = 0.25)  # Time steps

# Initialize a vector to store population size at each time step
Nt_logistic <- numeric(length(t))
Nt_logistic[1] <- N0  # Set initial population

# Calculate population size at each subsequent time step
for (i in 2:length(t)) {
  growth_factor = 1 - (Nt_logistic[i - 1] / K)
  Nt_logistic[i] = Nt_logistic[i - 1] + (r * Nt_logistic[i - 1] * growth_factor)
}

# Create a data frame for plotting
df_logistic <- data.frame(Days = t, Population = Nt_logistic)

# Plot with ggplot2
ggplot(df_logistic, aes(x = Days, y = Population)) +
  geom_line() +
  labs(title = "Logistic Population Growth", x = "Days", y = "Population Size")

What is occurring with these results? How has the shape of our curve changed? What does this shape imply?

We will talk about modeling the population growth of multiple species at once to see how interactions between species affect population growth in the following weeks.

R’s ordinary differential equation (ODE) solver deSolve() can be used to analyze population growth. ODEs are commonly used to calculate rates of change in order to model dynamic systems and we will use them quite a bit throughout the semester.

As a final example, the figure on the left(Stevens 2009) is an example of a simple continuous logistical growth of a population converging at \(K\). The plot on the right(Stevens 2009) shows a simulation of populations converging towards \(K\) at different rates depending on their traits. Regardless of \(N\) or \(r\), the population sizes eventually converge at \(K\).

#install.packages("deSolve")

library(deSolve)

Warning: package 'deSolve' was built under R version 4.3.2

clogistic <- function(times, y, parms) {
  n <- y[1]
  r <- parms[1]
  alpha <- parms[2]
  dN.dt <- r * n * (1 - alpha * n)
  return(list(c(dN.dt)))
}

#We create vectors for the parameters and the initial densities for all of the populations in the model. We also specify the time.
prms <- c(r = 1, alpha = 0.01)
init.N <- c(1)
t.s <- seq(0.1, 10, by = 0.1)

# We load the deSolve library, and run ode. The output is a matrix that includes the time steps and N.
require(deSolve)
out <- ode(y = init.N, times = t.s, clogistic, parms = prms)
plot(out[, 1], out[, 2], type = "l", xlab = "Time", ylab = "N")

outmat <- matrix(NA, nrow = length(t.s), ncol = 20)
for (j in 1:20) outmat[, j] <- {
  y <- runif(n = 1, min = 0, max = 120)
  prms <- c(r = runif(1, 0.01, 2), alpha = 0.01)
  ode(y, times = t.s, clogistic, prms)[, 2]
}


matplot(t.s, outmat, type = "l", col = 1, ylab = "All Populations")

Introduction to Simulations

Among other uses, simulations can give us an alternative approach to prediction. We continue to monitor our transects on the ship and store our data in the tribbles.rds data file.

`.Rds` and `.Rda` files in R

.rds and .rda (or .Rdata) files are used to store data natively in R.

The main advantage of .rda files is that they can contain a variety of data in a single file (objects, plots, functions, packages, etc.), however when loading a .rda file you run the risk of over-writing existing objects, plots, etc. with the same name.

.rds files can only contain a single data structure. A benefit of this file type is that loading a .rds file will not overwrite data structures with the same name.

First load and view the data. We will use the head() function to just view the first columns of the data set to get an idea of their contents.

tribbles <- readRDS(file = "tribbles.rds")
head(tribbles)

   Count Days
37    17    1
38    51    2
39   153    3
40   459    4
41  1377    5
1   3400    6

It is always good to visualize our data to better understand it. Plot the data (Days x Count):

ggplot(data = tribbles, aes(x = Days, y = Count)) +
  geom_point() +
  labs(x = "Days", y = "Count")

Now we will calculate the observed change in population counts per day as \(R_{t}\). \(R_{t}\) is similar to \(\lambda\), however \(R_{t}\) represents the ratio between two particular time points.

\[ R_{t} \ =\ \frac{N_{t\ +\ 1}}{N_{0}} \]

In our example, this can be calculated by:

# The character - means exclude in R subsetting, which is distinct from other languages where - means starting from the end

R_t <- tribbles$Count[-1] / tribbles$Count[-length(tribbles$Count)]
R_t

 [1] 3.0000000 3.0000000 3.0000000 3.0000000 2.4691358 1.1764706 1.0500000
 [8] 1.2857143 0.9074074 1.4489796 0.9295775 0.8484848 1.2857143 1.0694444
[15] 0.4805195 0.8648649 0.8125000 0.8461538 1.4090909 0.9032258 1.4642857
[22] 0.8780488 1.0277778 1.1891892 0.7727273 1.3823529 0.9574468 0.9111111
[29] 0.7804878 1.2500000 3.0000000 0.8000000 0.9687500 0.7204301 1.2537313
[36] 1.0714286 0.4111111 0.9189189 1.0000000 1.2647059

Again, let’s visualize the results to get an idea of the data. Plot days by \(R_{t}\).

# Create a new data frame for ggplot excluding the last day
tribbles_rt <- data.frame(
  R_t = R_t,
  Days = tribbles$Days[-length(tribbles$Days)]
)

ggplot(tribbles_rt, aes(x = Days, y = R_t)) +
  geom_point() +
  geom_hline(yintercept = 1, linetype = "dashed") +  # Add a dashed line at R_t = 1
  labs(x = "Days", y = "R_t")

Why did we include the horizontal line at \(R_{t}\) = 1? What does being above, below, and along his line indicate?

Like \(\lambda\) above, \(R_{t}\) = 1 indicates the intermediate threshold between population decline and positive population growth.

In any analyses it is always important to state our assumptions. In this simulation, we will assume that any of the observed values for \(R_{t}\) are equally likely to occur. Because of this assumption we can resample with replacement from our dataset to simulate predictions of what might happen to population size in the future. This is like putting each value in a hat, randomly drawing a value and recording it, then replacing it back in the hand with the possibility of drawing it again. Also note that everything a computer does is technically pseudorandom because the randomness is programmed.

Bootstrapping

Bootstrapping is the process of resampling without replacement, where n matches the n of your sample.

We can set the number of draws by the number of days. (Note that in this case we will draw one more value than the number indicated because we have starting day at day 0). Then we can randomly draw using the sample() function.

days <- 50

R_t.s.sim<- sample(x = R_t, size = days, replace = TRUE)

R_t.s.sim

 [1] 0.8461538 0.9111111 0.7204301 0.9687500 1.1891892 0.8648649 1.4489796
 [8] 1.2857143 2.4691358 0.9074074 1.0714286 1.4489796 0.9189189 1.0000000
[15] 1.4642857 0.8648649 0.8780488 0.8125000 0.7804878 3.0000000 1.2857143
[22] 1.0714286 0.4111111 0.9111111 1.2537313 0.8648649 0.9032258 0.8780488
[29] 1.2537313 1.2500000 1.4090909 1.2647059 0.9074074 1.3823529 1.2647059
[36] 0.8000000 3.0000000 0.9189189 1.0277778 1.4489796 3.0000000 0.7204301
[43] 1.1764706 1.0694444 1.0694444 3.0000000 1.2647059 1.1891892 1.4090909
[50] 2.4691358

Then we can loop through the drawn values to calculate predicted population size.

Note that by default, apply() family functions are not applied iteratively, therefore they try to apply the function to the entire object at once. In this instance, we need to include a <<- term in assigning outputs, this assigns the output to the parent scope which allows the function to operate iteratively.

In the case of applying functions iteratively, for loops are often simpler because they are iterative by default.

# Create empty object of appropriate size

data.sim <- numeric(days + 1)
data.sim

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0 0

# Start with last value from the real data
data.sim[1] <- tribbles$Count[tribbles$Days == max(tribbles$Days)]

# Because we want to use previous values iteratively, a `for` loop is the perfect fit
for (t in 1:days) data.sim[t + 1] <- {
   data.sim[t] * R_t.s.sim[t]
}

# Create a data frame for ggplot
simulation_df <- data.frame(
  Days = 0:days,  # Creating a sequence from 0 to the total number of days
  Simulated_Population = data.sim 
)

ggplot(simulation_df, aes(x = Days, y = Simulated_Population)) +
  geom_point() +
  labs(x = "Days", y = "Simulated Population Size")

This particular simulation just gives us one simulated possibility. However, there are many equally likely simulated possibilities. A single trajectory is very unlikely on its own because it would require a particular sequence of \(R_{t}\) to be drawn. However multiple simulations can give us a better idea of the range of possibilities.

Let’s create a matrix of 10 simulations. Therefore, we will have 10 simulations x 50 days.

sims <- 10

M.sim <- matrix(sample(R_t, sims * days, replace = TRUE),
                nrow = days, ncol = sims)

dim(M.sim)

[1] 50 10

Now that we have a matrix of randomly drawn \(R_{t}\) sequences, just as we did above we can predict population sizes under these differing scenarios.

# Start with last value from the real data
data.sim[1] <- tribbles$Count[tribbles$Days == max(tribbles$Days)]

# Once again, in this case a `for` loop is a simpler solution
matrix.sim <- sapply(1:sims, function(i) {
  for (t in 1:days) data.sim[t + 1] <- data.sim[t] * M.sim[t, i]
  data.sim
})

# This can also be accomplished using mapply()
# matrix.sim <- mapply(function(i, t) {
#   data.sim[1] <- tribbles$Count[tribbles$Days == max(tribbles$Days)]
#   for (j in 1:t) data.sim[j + 1] <- data.sim[j] * M.sim[j, i]
#   data.sim
# }, 1:sims, days)
# matrix.sim <- t(matrix(unlist(matrix.sim), ncol = days + 1, byrow = TRUE))

simulation_df <- data.frame(
  Days = rep(0:days, sims),
  Simulated_Population = as.vector(matrix.sim),
  Simulation = rep(1:sims, each = length(0:days))
)


ggplot(simulation_df, aes(x = Days, y = Simulated_Population, group = Simulation, color = as.factor(Simulation))) +
  geom_point() +
  scale_y_log10() +  
  labs(x = "Days", y = "Simulated Population Size", color = "Simulation")

This simulation allows us to see different population abundance trajectories, predicting what trajectories may look like given the range of growth rates already observed in this population.

References

Roser, Max, and Hannah Ritchie. 2023. “How Has World Population Growth Changed over Time?” Our World in Data.

Stevens, Martin Henry Hoffman. 2009. A Primer of Ecology with r. Springer.

Wickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data Analysis.” https://ggplot2.tidyverse.org.