# Create a vector that contains numbers 1 to 10 (`1:10`)
<- c(1:10)
vec
# Apply `summary()` to the vector
summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.25 5.50 5.50 7.75 10.00
Functions are actions on an input. They generally produce an output. We have already been using them above, like class()
, matrix()
, list()
, and subset()
.
R is an object-oriented language because everything in R is an object.
By definition in programming, an object is an instance of a class. Classes have certain attributes/properties and specific functions, which are referred to as methods. When an object is created in your code, that object inherits the attributes of the class you assigned it to. These attributes also serve as built-in parameters for the associated methods.
There are also subclasses (child classes) and superclasses (parent classes). If the object you create is a member of subclass, it will inherit all attributes of its class as well as its parent classes.
Object-oriented programming is desirable over procedural programming because it is an efficient, non-redundant way to organize and assign attributes to data. Through object-oriented programming you do not have to assign attributes over and over again each time you initialize an object.
# Create a vector that contains numbers 1 to 10 (`1:10`)
<- c(1:10)
vec
# Apply `summary()` to the vector
summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.25 5.50 5.50 7.75 10.00
# Convert the same vec into a dataframe
<- as.data.frame(vec)
df
# Apply `summary()` to the dataframe
summary(df) # Treats each column separately, including column names
vec
Min. : 1.00
1st Qu.: 3.25
Median : 5.50
Mean : 5.50
3rd Qu.: 7.75
Max. :10.00
Whenever possible in your programming career, it is generally better to write code following object-oriented principles over procedural programming principles.
Functions must be defined before they can be called. In R, the definition of a function allows you to:
Name the function
List the required inputs as parameters
Define what the function does
The basic format for writing a function is R is:
<- function(parameter1, parameter2) {
Function_Name c(parameter1, parameter2)
}
When we defined the function, nothing seemed to happen. That is because we must call the function in order to run it.
In order to call the function, we must provide the required parameters as arguments.
Function_Name("argument1", "argument2")
[1] "argument1" "argument2"
Now it is your turn! Write a function called Calculate_Mean()
that calculates the mean of vector c(1,2,3,4)
using the sum()
and length()
functions within your function definition.
Remember, if you want to check what a function does in R, you can use the help()
function or the ?
. For example help(length)
or ?length()
.
<- c(1,2,3,4)
vector_1
<- function(vector) {
Calculate_Mean sum(vector) / length (vector)
}
Calculate_Mean(vector_1)
[1] 2.5
mean(2.5)
[1] 2.5
Functions can also contain default values for parameters following the format:
<- function(parameter1, parameter2 = "default") {
Function_Name c(parameter1, parameter2)
}
Function_Name("argument1")
[1] "argument1" "default"
These default values can be overwritten by the provided arguments
Function_Name("argument1", "argument2")
[1] "argument1" "argument2"
c(1,2,3,4)
by two using a default parameter value of 2.
<- function(vector, denominator = 2) {
Divide_by_Two / denominator
vector
}
Divide_by_Two(vector_1)
[1] 0.5 1.0 1.5 2.0
Divide_by_Two(vector_1, 3)
[1] 0.3333333 0.6666667 1.0000000 1.3333333
Note what happens by default when we assign the results of function to a new object inside of a function:
<- c(1,2,3,4)
vector_1
<- function(vector, denominator = 2) {
Divide_by_Two <- vector / denominator
vector_4
}
Divide_by_Two(vector_1)
vector_4
Error: object 'vector_4' not found
The new object vector_4
is not found outside of the function. This is happening because objects, datasets, and even other functions inside of a function are kept within their own environment.
The base environment is referred to as the “global environment.” The contents of this global environment is displayed in the upper-righthand of Rstudio. We can check that vector_2
is not in the global environment by checking that it is not found in the environment pane. So how do we pass this object from inside the function into the global environment?
R gives you a few options and tries to make this easy for you. Let’s start by clearing the global environment with rm(list=ls())
.
rm(list=ls())
Now you can see that the environment pane is empty.
<- c(1,2,3,4)
vector_1
<- function(vector, denominator = 2) {
Divide_by_Two / denominator
vector
}
<- Divide_by_Two(vector_1)
vector_4 vector_4
[1] 0.5 1.0 1.5 2.0
<<-
. Even if an assignment is made inside of a function, this assignment will pass the object globally.
rm(list=ls())
<- c(1,2,3,4)
vector_1
<- function(vector, denominator = 2) {
Divide_by_Two <<- vector / denominator
vector_4
}
Divide_by_Two(vector_1)
vector_4
[1] 0.5 1.0 1.5 2.0
<- function(vector, denominator = 2) {
Divide_by_Two <<- vector / denominator
vector_4
}
<- Divide_by_Two(vector_4)
vector_4 vector_4
[1] 0.25 0.50 0.75 1.00
<- Divide_by_Two(vector_4)
vector_4 vector_4
[1] 0.125 0.250 0.375 0.500
<- Divide_by_Two(vector_4)
vector_4 vector_4
[1] 0.0625 0.1250 0.1875 0.2500
return()
statement.rm(list=ls())
<- c(1,2,3,4)
vector_1
<- function(vector, denominator = 2) {
Divide_by_Two <- vector / denominator
vector_4 return(vector_4)
}
<- Divide_by_Two(vector_1)
vector_4 vector_4
[1] 0.5 1.0 1.5 2.0
return()
function in R, as another attempt of R to make returns more automatic
<- function(vector, denominator = 2) {
Divide_by_Two <- vector / denominator
vector_4
vector_4 }
Models are an abstraction of reality. This means that models exclude much more information than they include. That is a limitation we must keep in mind when drawing conclusions from models, but this is also a very desirable attribute of models.
Imagine you were on a hiking trail and wanted to find the quickest way back to the trailhead. You see a sign-post with a map posted, so you approach it. If that map contained all of the information about every aspect of that area, that would be very unhelpful. Instead the map is designed as an abstraction, presenting only the information the users desire to know about the real world.
In general across fields, models are designed to provide at least one of three utilities to their users. The ability to:
Predict
Explain
Generalize
Which of these three utilities is the trail map designed to provide?
As you recall this course focuses heavily on community ecology. In lab, we will build gradually to the community level. Today we will start by analyzing interactions within a population.
A population is a group of conspecifics (one species) that is isolated from other populations of the same species.
A community is interacting populations of different species in an ecosystem.
We will begin our labwork by exploring density-independent population growth. Density-independent population growth occurs when the growth rate of a population size is not affected by the number of conspecifics in that population. Each individual in a population does not affect the fitness of another member of that same population.
Density-independent population growth is relatively simple to model because it can be represented by a simple geometric series and generally involves fewer parameters than other models of population.
However, it is also biologically useful to understand because density-independent population growth can occur in nature.
When a population:
Is below carrying capacity
Has access to unlimited resources, like in a lab setting
Is experiencing release from anthropogenic pressure
Is experiencing predator release:
Via removal of the predators/parasites from the population’s range
Enemy release and the founder effect resulting from the population invading another area free from natural predators/parasites
Etc.
Let’s model a scenario of density-independent population growth together. In Star Trek the hairy, featureless species called tribbles seem to have an extraordinary population growth rate. They are said to be “born pregnant.” It seems as if their population growth might be density-independent, but let us make a model to see if we think that is true.
We decide on a transect and monitor the tribbles on the ship. Between generations we observe the following:
Day | Number of individuals |
---|---|
1967-12-29 | 17 |
1967-12-30 | 51 |
1967-12-31 | 153 |
1968-01-01 | 459 |
1968-01-02 | 1377 |
Create vectors from number of tribbles (n
) per day (days
):
= c(1:5)
days = c(17, 51, 153, 459, 1377) n
Let’s plot the number of tribbles (n
) per day (days
) using ggplot2
(Wickham 2016). Note that in order to use ggplot2
we must have a dataframe data stucture as our input. Use geom_step()
and geom_point()
for the geoms.
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.4.2
<- data.frame(
tribble_df Days = days,
N = n
)
ggplot(data = tribble_df, aes(x = Days, y = N)) +
geom_step() + # This creates the stepped line which shows discrete growth
geom_point() + # This adds points
labs(x = "Days", y = "Number of Tribbles") # Labels for axes
How can we predict how large we expect the population to be tomorrow? We need to calculate following:
\[ \begin{equation*}\frac{N_{t+1}}{N_{t}}\end{equation*} \]
= n[2:5]/n[1:4]
ratio ratio
[1] 3 3 3 3
\[ 200\% \] \[ 3n\ =\ n\ +\ 2n \]
The engineers have estimated that the “Aeroponics Bay” can only provide enough oxygen to support 111,537 tribbles. We need to predict how many days until this population size is reached if this growth rate continues.
n[5]
calculate how many tribbles you predict there will be on day 6 (n_day_6
)
5]*3 n[
[1] 4131
n[5]*3
, calculate how many tribbles you predict there will be on day 7
5]*3)*3 (n[
[1] 12393
Do you see a trend? We can generalize this trend in the following way:
\[ \begin{array}{l}N_{6} \ =\ 3 N_{5}\\N_{7} \ =\ 3\ ( 3 N_{5})\\N_{8} \ =\ 3\ ( 3\ ( 3 N_{5}))\\\\= >\\\\N_{t} \ =\ 3^{t} N_{0}\end{array} \]
As a general term, we can introduce \(\lambda\) as the finite rate of increase. Note that \(\lambda\) has no unit.
\[ N_{t} \ =\ N_{0} \lambda ^{t} \]
apply()
Loops
Because R is built around vectors as its most basic unit, R is very well suited for applying functions across rows or columns of a data structure. However, to help with readability the apply family of functions can be used. They are similar to for loops which are also a useful tool.
<- cbind(c(6,5,4),c(3,2,1))
matrix_1
<- function(M) { c(sum(M[,1])/length(M[,1]),sum(M[,2])/length(M[,2])) }
average_columns
# These work as well:
# colMeans(M)
# colSums(M)/nrow(M)
average_columns(matrix_1)
[1] 5 2
apply()
Now let’s do the same process with the apply()
function and the mean
function instead of our own function. The apply()
function applies any given function to the columns or rows of only multi-dimensional data structures, e.g. matrices, arrays, and dataframes. It returns output in the same data structure as the input.
Following the help file, here is the structure: apply(X, MARGIN, FUN, …, simplify = TRUE), where MARGIN = 1 is for rows while MARGIN = 2 is for columns and FUN is any function we want to apply.
apply(matrix_1, MARGIN = 2, mean)
[1] 5 2
apply(matrix_1, MARGIN = 2, sqrt)
[,1] [,2]
[1,] 2.449490 1.732051
[2,] 2.236068 1.414214
[3,] 2.000000 1.000000
apply(vector_1, MARGIN = 1, sqrt)
Error in apply(vector_1, MARGIN = 1, sqrt): dim(X) must have a positive length
lapply()
The lapply() function performs in a similar way, however it can also be used on 1-dimensional data structures as well, like lists and vectors, whereas apply() would throw an error in those use cases. Because it does not apply to data structures that consist of 1 row, it operates on elements instead of columns/rows. The output is always a list, even if the input is multi-dimensional. For example, it treats a matrix as a list of columns.
lapply(vector_1, sqrt)
[[1]]
[1] 1
[[2]]
[1] 1.414214
[[3]]
[1] 1.732051
[[4]]
[1] 2
lapply()
with matrix_1 as an input
lapply(matrix_1, sqrt)
[[1]]
[1] 2.44949
[[2]]
[1] 2.236068
[[3]]
[1] 2
[[4]]
[1] 1.732051
[[5]]
[1] 1.414214
[[6]]
[1] 1
sapply()
sapply() is similar to lapply() in that it can accept a data structure input of 1-dimension. However it does not always return a list, it can return a vector if it is possible.
sapply(vector_1, sqrt)
[1] 1.000000 1.414214 1.732051 2.000000
Here is a summary of what we have covered so far.
There are other apply family functions like mapply() and tapply(), however we will not cover those at this time.
It’s worth mentioning that there are other alternatives to the apply family of functions including map() from the purrr(Wickham and Henry 2023) package as well as rowwise(), colwise(), and across() from the dplyr(Wickham et al. 2022) package.
apply()
Family Shorthand Notation
apply(x, function(y) y)
is referred to as “shorthand notation” or “shorthand syntax.” Because we are using a placeholder function without even naming it, it is referred to as an “anonymous function” and we can use shorthand to define what the function does. Meanwhile x
is the argument and y
is the parameter.
Let’s go back to managing our tribble crisis. Recall we found we were able to represent density-independent population growth with the following equation:
\[ N_{t} \ =\ N_{0} \lambda ^{t} \]
Calculate how many more days you predict until the population exceeds 111,537 tribbles. You can use the following sapply()
shorthand format that you just learned about:
<- sapply(N0, function(n) n * lambda^t) Nt.s
<- 1377
N0 <- 3
lambda <- 1:5
t
<- sapply(N0, function(n) n * lambda^t)
Nt.s Nt.s
[,1]
[1,] 4131
[2,] 12393
[3,] 37179
[4,] 111537
[5,] 334611
Only 4 days due to the compounding nature of density-independent growth.
The logarithmic scale is useful when analyzing rates. In this example, the logarithmic scale provides a clearer way to visualize this steady trend that may not have been immediately apparent in our earlier plot.
<- data.frame(
tribble_df_discrete Days = t,
N = Nt.s
)
# Plot without the logarithmic scale
ggplot(tribble_df_discrete , aes(x = Days, y = N)) +
geom_step() +
geom_point() +
labs(x = "Days", y = "Population Size")
# Now plot with a logarithmic scale by adding the line `scale_y_log()` to the end of your ggplot pipe
ggplot(tribble_df_discrete , aes(x = Days, y = N)) +
geom_step() +
geom_point() +
labs(x = "Days", y = "Population Size") +
scale_y_log10() # Log transform the y-axis
From our previous analysis, it looks like the ship is in danger. How much of an effect would it have to cull the current population?
Use the following values to determine how much would the population need to be culled today to extend oxygen reserves another full day:
<- c(N0, 918, 459, 230) N0.s
<- c(N0, 918, 459, 230)
N0.s <- 1:5
t
<- sapply(N0.s, function(n) n * lambda^t)
Nt.s Nt.s
[,1] [,2] [,3] [,4]
[1,] 4131 2754 1377 690
[2,] 12393 8262 4131 2070
[3,] 37179 24786 12393 6210
[4,] 111537 74358 37179 18630
[5,] 334611 223074 111537 55890
Initial population size can have a large effect on populations undergoing density-independent growth!
So far, we have modeled this scenario with a per capita growth rate of 3. What do you think would happen if the per capita growth rate approached 1? What if it went below 1?
c(0.75, 1, 1.5)
for N0
over the next 5 days.
<- c(0.75, 1, 1.5)
lambdas
<- sapply(lambdas, function(lambda) N0 * lambda^t)
Nt.s
<- data.frame( # We need to convert to a dataframe to work with `ggplot2`
Nt.df Days = rep(t, times = length(lambdas)),
N = as.vector(Nt.s),
Lambda = rep(lambdas, each = length(t))
)
ggplot(Nt.df, aes(x = Days, y = N, group = Lambda, color = factor(Lambda))) +
geom_step() +
geom_point() +
geom_hline(yintercept = N0, linetype = "dashed") + # Horizontal line at N0
labs(x = "Days", y = "N", color = "Lambda")
\[ \lambda \ > \ 1\ \Longrightarrow \ Positive\ population\ growth\\ \lambda \ =\ 1\ \Longrightarrow \ Population\ size\ constant\\ \lambda \ < \ 1\ \Longrightarrow \ Population\ decline \]
In nature, discrete logistic population growth is not as common as continuous logistic population growth. An example of discrete logistic population growth is the growth of a population of annual plants, whereas an example of continuous logistic population growth is the growth of a human population where reproduction occurs continuously.
In general as a convention, \(\lambda\) is used for discrete population growth and \(r\) is used for continuous population growth.
Exponential/Density Independent | Logistic/Density Dependent | |
---|---|---|
Discrete | Population grows by a constant proportion λ at each time step (e.g. year). \[ N \lambda^{t} \] |
Population grows by a constant proportion each time step, but the growth rate decreases as the population size approaches the carrying capacity K. \[ \lambda N(1-\frac{N_{t}}{K}) \] |
Continuous | Population grows continuously at an instantaneous rate r at each time step (e.g. year). \[ Ne^{rt} \] |
Population grows continuously (not at discrete time steps), but the growth rate decreases as the population size approaches the carrying capacity K. \[ rN(1-\frac{N}{K}) \] |
Let’s start with plotting discrete exponential population growth again in the same way we did above
<- 1377
N0 <- 3
lambda <- 1:5
t
# Create equation for discrete population growth
<- sapply(N0, function(n) n * lambda^t)
Nt_discrete
<- data.frame(Days = t, Population = Nt_discrete)
df_discrete
ggplot(df_discrete, aes(x = Days, y = Population)) +
geom_step() +
geom_point() +
labs(title = "Discrete Population Growth", x = "Days", y = "Population Size")
What happens when we make the spacing between days smaller? Try it with `t <- seq(1, 5, by = 0.25)`
<- 1377
N0 <- 3
lambda <- seq(1, 5, by = 0.25)
t
# Create equation for discrete population growth
<- sapply(N0, function(n) n * lambda^t)
Nt_discrete
<- data.frame(Days = t, Population = Nt_discrete)
df_discrete
ggplot(df_discrete, aes(x = Days, y = Population)) +
geom_step() +
geom_point() +
labs(title = "Discrete Population Growth", x = "Days", y = "Population Size")
How about even smaller?
<- 1377
N0 <- 3
lambda <- seq(1, 5, by = 0.01)
t
# Create equation for discrete population growth
<- sapply(N0, function(n) n * lambda^t)
Nt_discrete
<- data.frame(Days = t, Population = Nt_discrete)
df_discrete
ggplot(df_discrete, aes(x = Days, y = Population)) +
geom_step() +
geom_point() +
labs(title = "Discrete Population Growth", x = "Days", y = "Population Size")
We are approaching a continuous growth pattern.
We can model continuous exponential population growth directly with the formula:
\[ Ne^{rt} \]
Re-write our code with this equation along with \(r=log(\lambda)\) and geom_line()
instead of geom_step()
.
# Define initial parameters
<- 1377
N0 <- 3
lambda <- log(lambda)
r <- seq(1, 5, by = 0.01)
t
# Create equation for continuous population growth
<- N0 * exp(r * t)
Nt_continuous
<- data.frame(Days = t, Population = Nt_continuous)
df_continuous
ggplot(df_continuous, aes(x = Days, y = Population)) +
geom_line() +
labs(title = "Population Size", x = "Days", y = "Population Size")
Compare this to the discrete population growth plot above with small intervals between timesteps. What do you notice?
These curves are very similar!
We found earlier that \(\lambda = 1\) is the threshold between positive and negative population growth. What is the equivalent threshold for \(r\)?
<- 1
lambda <- log(lambda)
r r
[1] 0
\[ \begin{array}{l} No\ Change:\ \lambda \ =\ 1\ ,\ r\ =\ 0\\ Population\ Growth:\ \lambda \ >\ 1\ ,\ r\ >\ 0\\ Population\ Decline:\ \lambda \ < \ 1\ ,\ r\ < \ 0 \end{array} \]
As we collect more data on the tribble population, we find that it does not match our original predictions following density-independent growth calculations. Instead, let’s explore if there may be something limiting population growth.
Limited space
Limited resources (food, water, etc.)
Limited necessary habitat
Etc.
Perhaps limited by their own abundance: density-dependent
Recall from lecture, that we can represent density-dependent growth as
\[ \frac{dN}{dt} \ =\ rN\left( 1\ -\ \frac{N}{K}\right) \]
\(r\) is defined as \(r=ln(\lambda)\)
\(K\) is the carrying capacity of the population. Do you remember from lecture at which values of \(K\) is the population considered stable?
\[ \begin{array}{l}r\ =\ 0\\K\ =\ N\end{array} \]
There can be multiple definitions of stability for a population including:
Resilience - Long term rate of return towards equilibrium
Resistance - Tendency to resist change in population size
Reactivity - Rate of change immediately following disturbance
Permanence - Degree to which a population size does not experience boundary equilibria
Today, we will focus on resilience, the ability of a population size to return to an equilibrium when a disturbance occurs. We can monitor the strength of this attraction back to equilibrium by measuring the rate of attraction. This point of attraction, or the attractor, is the carrying capacity.
There is another equilibrium point at \(N = 0\). When the ship was devoid of all tribbles, the population was at a stable size of 0. However, this is considered the unstable equilibrium point, or repeller, because as soon as this population size was perturbed by the introduction of tribbles, the population size was rapidly repelled away from the unstable equilibrium point and potentially attracted towards the stable equilibrium point of the population carrying capacity.
A: No growth; population near zero.
B: Population growing slowly.
C: Maximum growth rate.
D: Stable population (equilibrium).
E: Population decline (resources exceeded).
Now let’s add a carrying capacity to our model. We can do this by adding a term to the equation that represents the carrying capacity, \(K\). Here is the equation from lecture:
\[ rN(1-\frac{N}{K}) \]
For our purposes in lab, this similar equation (logistic growth equation) will make it easier for us to convert this concept into code:
\[ N_{t} = \frac{K}{1 + \left(\frac{K - N_{0}}{N_{0}}\right) e^{-rt}} \]
Add a carrying capacity of 90000 at day +4 to our model. Based on the plot above what would you expect to happen at that value?
# Define initial parameters
<- 1377
N0 <- 3
lambda <- log(lambda)
r <- 90000
K <- 1:10
t
# Write logistic function
<- sapply(t, function(t) K / (1 + ((K - N0) / N0) * exp(-r * t)))
Nt_logistic
# Create dataframe
<- data.frame(Days = t, Population = Nt_logistic)
df_logistic
# Plot
ggplot(df_logistic, aes(x = Days, y = Population)) +
geom_line() +
geom_point() +
labs(title = "Logistic Population Growth", x = "Days", y = "Population Size")
What is occurring with these results? How has the shape of our curve changed? What does this shape imply?
We will talk about modeling the population growth of multiple species at once to see how interactions between species affect population growth in the following weeks.
R’s ordinary differential equation (ODE) solver deSolve() can be used to analyze population growth. ODEs are commonly used to calculate rates of change in order to model dynamic systems and we will use them quite a bit throughout the semester.
As a final example, the figure on the left(Stevens 2009) is an example of a simple continuous logistical growth of a population converging at \(K\). The plot on the right(Stevens 2009) shows a simulation of populations converging towards \(K\) at different rates depending on their traits. Regardless of \(N\) or \(r\), the population sizes eventually converge at \(K\).
#install.packages("deSolve")
library(deSolve)
Warning: package 'deSolve' was built under R version 4.4.2
<- function(times, y, parms) {
clogistic <- y[1]
n <- parms[1]
r <- parms[2]
alpha <- r * n * (1 - alpha * n)
dN.dt return(list(c(dN.dt)))
}
#We create vectors for the parameters and the initial densities for all of the populations in the model. We also specify the time.
<- c(r = 1, alpha = 0.01)
prms <- c(1)
init.N <- seq(0.1, 10, by = 0.1)
t.s
# We load the deSolve library, and run ode. The output is a matrix that includes the time steps and N.
require(deSolve)
<- ode(y = init.N, times = t.s, clogistic, parms = prms)
out plot(out[, 1], out[, 2], type = "l", xlab = "Time", ylab = "N")
<- matrix(NA, nrow = length(t.s), ncol = 20)
outmat for (j in 1:20) outmat[, j] <- {
<- runif(n = 1, min = 0, max = 120)
y <- c(r = runif(1, 0.01, 2), alpha = 0.01)
prms ode(y, times = t.s, clogistic, prms)[, 2]
}
matplot(t.s, outmat, type = "l", col = 1, ylab = "All Populations")