10 Using MESS API mode in RStudio to explore hypotheses
10.1 Key questions
- How do I run MESS from inside RStudio?
- How do I get simulation results in a format that is useful?
- How do I visualize simulation results to gain insight?
- How do I use MESS to explore/generate theoretical predictions?
10.2 Lesson objectives
After this lesson, learners should be able to…
- Run MESS simulations in R
- Summarize and plot simulation results
- Formulate hypotheses for how tweaking parameters will affect model outputs
- Use MESS to test these hypotheses in silico
- Synthesize the outcomes
10.3 Planned exercises
- Import MESS in R and run simulations in API mode
- Load, inspect, and plot simulation results
- Group brainstorming of params to tweak
- Breakout to test hypothesized model behavior given different parameter values
- Regroup to debrief/synthesize results
10.3.1 Import MESS in R and run simulations in API mode
The MESS CLI is a simple and powerful way to run simulations, but the iBioGen
package also provies an API mode (in both R and python) which lets you dive under the hood of the CLI mode a bit. You have all the power of the CLI mode, yet more flexibility. In the source pane of RStudio open a new RScript and import the iBioGen
package.
library(reticulate)
## Import the iBioGen package and give it an alias so we can refer to it as MESS
<- import("iBioGen", as="MESS") MESS
The first step in API mode is to create a Core
object. The Core
object is the fundamental unit of the simulation, encompassing a standard Island Biogeography configuration with a very large Metacommunity
and a much smaller LocalCommunity
which is connected to it by colonization. In creating a Core
object the only thing you’re required to pass in is a name, so lets go with “api-simdata”, to differentiate from the earlier simdata
in CLI mode.
= MESS$Core("api-simdata") core
We can view the parameters of this new core
object with the get_params()
method:
## verbose=TRUE will format the parameters in a nice way when printed
$get_params(verbose=TRUE) core
------- iBioGen params file (v.0.0.9)-------------------------------------------
api-simdata ## [0] [simulation_name]: The name of this simulation scenario
./default_iBioGen ## [1] [project_dir]: Where to save files
1 ## [2] [birth_rate]: Speciation rate
taxa ## [3] [meta_stop_criterion]: Whether to stop metacommunity on ntaxa or time
20 ## [4] [ntaxa]: Number of taxa to simulate if stop is `ntaxa`
4 ## [5] [time]: Amount of time to simulate if stop is `time`
abundance ## [6] [process]: Whether to evolve `abundance` or growth `rate` via BM
True ## [7] [ClaDS]: Whether to allow speciation rate change a la ClaDS
50000 ## [8] [abundance_mean]: Ancestral abundance at time 0
0.1 ## [9] [abundance_sigma]: Rate at which abundance changes if process is `abundance`
0 ## [10] [growth_rate_mean]: Ancestral population growth rate at time 0.
0.01 ## [11] [growth_rate_sigma]: Rate at which growth rate changes if process is `rate`
0.1 ## [12] [ClaDS_sigma]: Rate at which speciation rate changes if ClaDS is True
0.9 ## [13] [ClaDS_alpha]: Rate shift if ClaDS is True
500 ## [14] [sequence_length]: Length of the genomic region simulated, in base pairs
1e-08 ## [15] [mutation_rate]: Mutation rate per base per generation
10 ## [16] [sample_size]: Number of samples to draw for calculating genetic diversity
None ## [17] [abundance_scaling]: Scaling abundance to Ne (None, log, ln or a ratio)
1000 ## [18] [J]: Number of individuals in the local community
0.005 ## [19] [colrate]: Rate of colonization into the local community (% / birth event)
neutral ## [20] [assembly_model]: Selecting neutral or non-neutral assembly processes
2 ## [21] [ecological_strength]: Impact of competition or filtering on fitness
1 ## [22] [generation_time]: Generation time of local community taxa (in years)
time ## [23] [local_stop_criterion]: Stop local community on time or equilibrium
100 ## [24] [local_stop_time]: Local community simulation duration (in generations)
100 ## [25] [tau_max]: Duration of anagenetic speciation (in generations)
10 ## [26] [gene_flow_effect]: Damping effect of gene flow (in generations) on tau_max
1000 ## [27] [Ne_scaling]: Scaling individuals to demes in local community False ## [28] [rm_duplicates]: Deduplicate seqs before calculating local community pi
These should all look somewhat familiar by now.
Changing parameters of Core
objects can be accomplished with the set_param()
method:
# Set a parameter to a fixed value
$set_param("J", 5000)
core# Set a parameter to a prior range to be sampled uniformly
$set_param("colrate", "0.01-0.05") core
Finally, we can generate simulations by calling the simulate()
method and specifying the number of simulations to perform:
## Lets do 100 simulations so there's a _chance_ we'll see some pattern in
## in the data
$simulate(nsims=100) core
Generating 100.0 simulation(s). [####################] 100% Finished 100 simulations in 0:00:03|
If you are still in the default_iBioGen
directory in your files pane, then you will see the new api-simdata-SIMOUT.csv
file pop up there:
Now you might see a glimmer of how the API mode might allow you much more power in manipulating parameters and running simulations, but there is more!
10.3.2 Load, inspect, and plot simulation results
Recall that the output of MESS simulations is a file which contains parameters and data generated with one simulation per line. In the raw form, the simulated data are bundled up in a somewhat obscure format, with reasonable looking information in most columns, but obscure information in three columns: metadata
, metatree
, and localdata
. The MESS API mode provides a method to unpack and summarize this obscure data into a way that is more meaningful and human readable (and also useful for plotting), and this is the load_sims()
method.
# Load simulation results
= core$load_sims()
results # Results is a 3 element list, each element is a data.frame that contains
# one row per simulation
# results[[1]] - params and sumstats
# results[[2]] - Raw data per simulation (abunds, pis, traits, etc)
# results[[3]] - global phylogeny per simulation
Lets focus for the moment on results[[1]]
, the data.frame of params and sumstats.
1]] results[[
birth_rate meta_stop_criterion ntaxa time process ClaDS abundance_mean abundance_sigma
1 1 taxa 20 4 abundance TRUE 50000 0.1
2 1 taxa 20 4 abundance TRUE 50000 0.1
3 1 taxa 20 4 abundance TRUE 50000 0.1
4 1 taxa 20 4 abundance TRUE 50000 0.1
5 1 taxa 20 4 abundance TRUE 50000 0.1
6 1 taxa 20 4 abundance TRUE 50000 0.1
7 1 taxa 20 4 abundance TRUE 50000 0.1
8 1 taxa 20 4 abundance TRUE 50000 0.1
9 1 taxa 20 4 abundance TRUE 50000 0.1
10 1 taxa 20 4 abundance TRUE 50000 0.1
growth_rate_mean growth_rate_sigma ClaDS_sigma ClaDS_alpha sequence_length mutation_rate sample_size
1 0 0.01 0.1 0.9 500 1e-08 10
2 0 0.01 0.1 0.9 500 1e-08 10
3 0 0.01 0.1 0.9 500 1e-08 10
4 0 0.01 0.1 0.9 500 1e-08 10
5 0 0.01 0.1 0.9 500 1e-08 10
6 0 0.01 0.1 0.9 500 1e-08 10
7 0 0.01 0.1 0.9 500 1e-08 10
8 0 0.01 0.1 0.9 500 1e-08 10
9 0 0.01 0.1 0.9 500 1e-08 10
10 0 0.01 0.1 0.9 500 1e-08 10
abundance_scaling J colrate assembly_model ecological_strength generation_time
1 None 5000 0.02816428 neutral 2 1
2 None 5000 0.04468862 neutral 2 1
3 None 5000 0.01764501 neutral 2 1
4 None 5000 0.01780714 neutral 2 1
5 None 5000 0.04374193 neutral 2 1
6 None 5000 0.03645229 neutral 2 1
7 None 5000 0.01470986 neutral 2 1
8 None 5000 0.01802192 neutral 2 1
9 None 5000 0.01327007 neutral 2 1
10 None 5000 0.03985648 neutral 2 1
local_stop_criterion local_stop_time tau_max gene_flow_effect Ne_scaling rm_duplicates
1 time 100 100 10 1000 FALSE
2 time 100 100 10 1000 FALSE
3 time 100 100 10 1000 FALSE
4 time 100 100 10 1000 FALSE
5 time 100 100 10 1000 FALSE
6 time 100 100 10 1000 FALSE
7 time 100 100 10 1000 FALSE
8 time 100 100 10 1000 FALSE
9 time 100 100 10 1000 FALSE
10 time 100 100 10 1000 FALSE
meta_obs_ntaxa meta_obs_time local_obs_time(gen) local_obs_time(eq) meta_turnover_rate ClaDS_m
1 20 7.238911 100 0.3724 0.20000000 0.9045113
2 20 3.304861 100 0.6238 0.00000000 0.9045113
3 20 3.618550 100 0.7324 0.04761905 0.9045113
4 20 2.464254 100 0.4310 0.00000000 0.9045113
5 20 4.223441 100 0.8360 0.00000000 0.9045113
6 20 2.421120 100 0.8176 0.00000000 0.9045113
7 20 5.467468 100 0.6822 0.00000000 0.9045113
8 20 2.940106 100 0.4306 0.00000000 0.9045113
9 20 7.681525 100 0.7238 0.00000000 0.9045113
10 20 5.754293 100 0.9566 0.00000000 0.9045113
local_S abund_h1 abund_h2 abund_h3 pi_h1 pi_h2 pi_h3 trait_h1 trait_h2 trait_h3
1 15 3.920161 2.333055 1.990338 5.600035 3.921122 3.358520 6.739850 4.842076 4.062254
2 16 7.128577 5.040473 4.096290 8.526984 7.054373 6.291502 7.802848 6.135959 5.476494
3 16 9.350507 7.267024 6.180439 7.649085 5.653503 4.763161 10.192978 8.298181 7.381126
4 20 4.587044 2.794328 2.309903 6.338057 5.330343 4.886343 6.694479 5.049368 4.660101
5 20 9.058059 6.625556 5.695691 7.261966 4.943116 4.148738 13.298536 9.102956 7.439101
6 18 4.943500 3.162048 2.632319 8.009983 6.529084 5.872421 8.282393 5.839957 4.894036
7 19 9.566252 6.654908 5.250049 9.211804 7.333543 6.414324 10.890139 8.424614 7.399630
8 15 4.710206 2.813355 2.309257 8.199976 6.782519 6.118322 9.273341 6.583071 5.696876
9 18 7.161965 5.070151 4.344310 5.869568 4.004179 3.236785 10.296570 8.314281 6.957238 10 17 6.138167 4.205529 3.658744 8.177872 6.203475 5.101742 11.851781 7.690572 5.713765
Most of these things should look familiar (if not understandable), but there are several new columns which are very important. You might also notice at first that the colrate
parameters per simulation do indeed vary, while all others are fixed, as we specified a prior range on this paramter.
The new thing to notice is that all the data (previously encapsulate in a human unreadable format) has been transformed into a much more user friendly summarization including local_S
(indicating local species richness), and 9 columns of the form *_h*
, which represent the first 3 Hill numbers on abundance (abund
), nucleotide diversity (pi
), and trait value distributions (trait
).
Since we manipulated the colrate
parameter in these simulations we can visualize the relationship between colrate
and a few of the resulting summary statistics, like local_S
, abund_h1
, and pi_h1
using the built-in R plotting function plot()
.
## For convenience, lets create a new variable for the results we're interested in
<- results[[1]]
sims
## Now plot local_S, abund_h1, and pi_h1 as a function of colrate
plot(sims$colrate, sims$local_S)
plot(sims$colrate, sims$abund_h1)
plot(sims$colrate, sims$pi_h1)
Here we can see a weak positive trend in all three summary statistics, with increasing colrate
correlating with an increase in each of these data axes.
It seems that local species richness is saturating, with uniformly high S
which increasingly approaches 20 as colrate
increases. What this means is that as colrate
increases, the local community increasingly contains a total sample of the metacommunity, and there are no more unique lineages to colonize. Think back to what we learned yesterday about the neutral theory parameters. What parameter might we manipulate and how in order to relax this constraint and prevent the local community from saturating in this way?
10.3.3 Group brainstorming of params to tweak
So far we have seen just a few MESS model parameters in action: J
, colrate
, and ntaxa
. Looking back at the params there are many other parameters that will change the behavior of the model. For the purpose of this workshop we will only consider parameters 14-27, which focus on the LocalCommunity
processes.
500 ## [14] [sequence_length]: Length of the genomic region simulated, in base pairs
1e-08 ## [15] [mutation_rate]: Mutation rate per base per generation
10 ## [16] [sample_size]: Number of samples to draw for calculating genetic diversity
None ## [17] [abundance_scaling]: Scaling abundance to Ne (None, log, ln or a ratio)
1000 ## [18] [J]: Number of individuals in the local community
0.005 ## [19] [colrate]: Rate of colonization into the local community (% / birth event)
neutral ## [20] [assembly_model]: Selecting neutral or non-neutral assembly processes
2 ## [21] [ecological_strength]: Impact of competition or filtering on fitness
1 ## [22] [generation_time]: Generation time of local community taxa (in years)
time ## [23] [local_stop_criterion]: Stop local community on time or equilibrium
100 ## [24] [local_stop_time]: Local community simulation duration (in generations)
100 ## [25] [tau_max]: Duration of anagenetic speciation (in generations)
10 ## [26] [gene_flow_effect]: Damping effect of gene flow (in generations) on tau_max
1000 ## [27] [Ne_scaling]: Scaling individuals to demes in local community False ## [28] [rm_duplicates]: Deduplicate seqs before calculating local community pi
Of particular interest are the assembly_model
and ecological_strength
parameters. assembly_model
can take three values: neutral
(default), competition
, and filtering
, which specify the neutral and non-neutral community assembly processes within MESS. ecological_strength
determines the ‘strength’ of the non-neutral assembly dynamics, and can take any value > 0. ecological_strength
(sometimes es
for short) values have the following effects:
# es ~ 0 = More neutral
# es ~ 1 = Noticeable non-neutrality # es >> 1 = Strong non-neutrality
10.3.4 Breakout to test hypothesized model behavior given different parameter values
Take some time (perhaps in small groups) to study the behavior of a few model parameters. It would probably benefit to make use of the power of API mode to create new Core
objects with sensible names that indicate the parameters you wish to manipulate. This will allow you to retain and compare simulation results across runs with different parameter combinations. For example:
## Here we give meaningful variable names to each Core object, so we can keep
## track of them
= MESS$Core("strong_competition")
str_comp $set_param("assembly_model", "competition")
str_comp$set_param("ecological_strength", 10)
str_comp$simulate(nsims=10)
str_comp= str_comp$load_sims()
str_comp_results ## Now you can summarize or visualize the results for these simulations
Take the time remaining in this lesson to construct one or two models to examine some of the hypothesized model behaviors we discussed earlier. Be prepared to discuss the results of your simulation experiments with the group.
If you have some interesting results, you can add a slide or two to the MESS-SimulationExperiments-Results google slides presentation for us to discuss.
10.3.5 Come together to debrief/synthesize results
Ok! Let’s wrap up this lesson and discuss some of the results from the simulation experiments. What did you discover?
10.4 Key points
- The
iBioGen
package can run MESS models in either CLI or API mode, which provides a rich and flexible programatic method for specifying and generating simulations. - Changing MESS parameters changes the predicted multi-dimensional biodiversity pattterns.
- Some MESS model parameters have predictable effects on patterns of data in the
LocalCommunity
and some have more diffuse or multi-dimensional effects.