6 Inferring the parameters of UNTB from outcomes

6.1 Key questions

How do we use the outcomes of a process model to guess at underlying processes (parameter values)?
How can we leverage machine learning (random forest) to infer parameter values from the outcomes of a neutral model?
What are the limiting factors in inferring process from outcome in a process model?

6.2 Learning objectives

Understand the outcome-to-parameter inference model.
Fit and evaluate a random forest model to our UNTB data.
Identify the limiting features of this approach (particularly model identifiability) and brainstorm possible solutions.

6.3 Lesson outline

Video lecture: Inferring the parameters of neutral theory from model outcomes

6.3.1 (Lecture/discussion) Going from outcome to parameters

Key question:

How could we use a process model to better understand actual data?

Key points:

As a starting point, we can see if we can recover generative parameters when we know how they were produced.

6.3.2 (Lecture/discussion) Random forest regression

Key points:

To guess at the parameters that generated some outcome data, we’re writing a model of the general form parameter ~ results. This isn’t necessarily a linear or otherwise tidy relationship, so we’ll use machine learning.
Random forest can do regression with many parameters predicting nonlinear relationships.
We will begin by using a random forest to try to recover parameter values for known sims.

6.3.3 (Lecture) Fitting a random forest to our neutral data

6.3.4 (Discussion) What problems arise in the random forest?

Key points:

Multiple sets of parameters can lead to the same outcomes, making it difficult to infer backwards.

6.3.5 (Discussion) How might we address these challenges?

Key points:

More data dimensions can break a many-to-one mapping.
It remains critical that the underlying process be appropriate.