Here we are going to set uninformative and informative priors for each of the parameters of the model. Please note that model estimation is based on Markov-chain Monte Carlo (MCMC), so the analyses described here might take some time to run.
The ‘ratematrix’ package estimates the root value and the evolutionary rate matrix (R) for a multivariate Brownian-motion model (mvBM) with single or multiple rate regimes fitted to the same phylogenetic tree. The mvBM model (and also the single trait BM model) have the behavior that trait values estimated for the nodes approach the mean of the tip data as we traverse the nodes of the tree from the tips to the root. This happens because BM models have a fixed mean and the variance increases proportional to time. As a result, we known a priori that the estimated root is not going to be outside the range of the observed tip values and will be likely close to the mean of the tip data. Such information about the statistical behavior of the model, even if not directly related to biology, can help us design prior distributions to the Markov-chain Monte Carlo (MCMC) analysis.
It is important to note that this behavior is expected for the estimated root value regardless of the true root value. The BM model, as well as many other phylogenetic comparative models of trait evolution, requires information at the nodes of the phylogeny for a more accurate estimate of the root value. Such information can come from fossil data, for example.
Load required packages:
## Loading required package: ape
## Loading required package: phytools
## Loading required package: maps
Simulate some data. Here we are using a phylogeny of 100 tips and two traits. However, it is straightforward to extend this analysis to more traits.
phy <- sim.bdtree(b=1, d=0, stop="taxa", n=100)
R <- rbind(c(0.5, 0.2),
c(0.2, 0.5) )
dat <- simRatematrix(tree=phy, vcv=R, anc=c(5,10))
We will run two separated analysis using this data. One with a uniform prior for the root values and another with a normally distributed prior (informative prior) centered far from the true parameter value for the model. The objective of this exercise is to show how to set your own custom prior distribution and how the model will behave if this prior is misguided.
First we need to set the prior. Although a uniform prior can be
easily set using built-in options of the function
ratematrixMCMC
, here we ‘manually’ define the prior as a
worked example using the function makePrior()
.
## Create a matrix the bounds for the uniform prior.
## Here will use the max and min of the tip data as bounds.
par.mu <- rbind( ceiling( range(dat[,1]) ), ceiling( range(dat[,2]) ) )
## Using a lognormal prior on the standard deviations, with log(mean)=0 and log(sd)=1.5.
par.sd <- c(0, 1.5)
## Prior on the covariance matrix is set to the default:
## marginally uniform on the covariances.
unif.prior <- makePrior(r = 2, p = 1, den.mu = "unif", par.mu = par.mu, den.sd = "lnorm"
, par.sd = par.sd)
After defining our prior we can take samples from it and evaluate the distribution by plotting the samples.
samples <- samplePrior(n = 1000, prior = unif.prior)
## Plot the prior samples between -10 and 10.
plotRatematrix(samples, set.xlim = c(-10,10))
## Plotting a single regime.
Now we are ready to run the MCMC analysis using the prior we created.
Here I slightly increased the frequency that the chain is sampling the
root values. Note that both the sampling frequencies for each parameter
of the model (prop
) and the width of the proposals
(w_sd
and w_mu
) are parameters that can (and
should!!) be tweaked by the user to maximize the efficiency of the
sampler.
The ratematrixMCMC
function will write files to the
working directory. The output is a handle object that stores the
information of the MCMC run. The handle object (here named
handle.unif.root
) will be required by other functions.
handle.unif.root <- ratematrixMCMC(data=dat, phy=phy, prior=unif.prior
, start="prior_sample", gen=200000
, outname="uniform_root_prior", dir=tempdir() )
The next code block will read the MCMC output from the files and make a fast convergence analysis.
## post.unif.root <- readMCMC(handle=handle.unif.root)
## Check the posterior object:
## post.unif.root
## checkConvergence(post.unif.root)
This convergence analysis is based only on a single chain. The output
table with TRUE/FALSE values are the result of the Heidel test (see
package coda
for more information). If provided with
multiple chains, checkConvergence
will compute the Gelman
and Rubin (1992) potential reduction factor index, which is a much more
robust test for convergence. The function logAnalyzer
will
make traceplots for the run.
Now we can plot the prior used for this analysis. Note that the prior on the root values is flat within the specified range. Of course, the histogram is based on samples from the prior distribution, so the result is not completely flat.
## Plotting the prior distribution.
## IMPORTANT NOTE: Pay attention to the scale of the rates when comparing the posterior and prior distribution plots!!.
In the same way, it is simple to make a quick check on the prior distribution for the evolutionary rate matrix (R)
## Plotting the prior distribution.
## IMPORTANT NOTE: Pay attention to the scale of the rates when comparing the posterior and prior distribution plots!!.
## Plotting a single regime.
Here we plot the posterior distribution for the vector of root values. Vertical red lines show the true parameter value used to simulate the data.
This plot makes it clear that the posterior distribution includes the true parameter value within its 95% highest posterior density (HPD) interval. This result is expected, since we simulated data under the model.
What if the prior on the root value is informative? The two most
common reasons to implement informative priors are; a) use of external
information such as trait values for the nodes (e.g., fossil data) or
knowledge of the underlying genetic architecture of the traits under
study, or b) to bond parameter values to a reasonable region of the
parameter space when faced with difficulty in convergence. The first
case is the most clear use of prior distributions. That is to inform our
analysis based on extra information about the problem. The second case
is more technical. The main point here is that when some of the
parameters of a model are hard to be estimated or when there are not
enough information in the data (maybe both!), the likelihood surface for
the model might become flat on some regions and/or show some ridges.
When the sampler gets stuck in some of these ridges the parameter values
can become unrealistic (i.e., super large or small numbers with limited
biological interpretation). In this case, even a weak prior distribution
favoring reasonable parameter values can help the chain reach
convergence. A useful analogy for this problem is the use of bounded
maximum likelihood searches (see option “L-BFGS-B” in function
optim
). We suggest checking work by Uyeda and Harmon (A
Novel Bayesian Method for Inferring and Interpreting the Dynamics of
Adaptive Landscapes from Phylogenetic Comparative Data, 2014, Systematic
Biology) for an interesting example on the use of informative
priors.
Here we are going to intentionally set a prior centered on values away from the true parameter values. This is an extreme example that will show the impact of the prior distribution on the posterior distribution. Then we will explore the effect that this has on the analysis results.
First we generate the prior distribution with steps similar to what we did before. Then we run analysis and gather results.
## Generate a matrix with the parameters for a normal distribution for the root values.
## Root_1: mean 0 and sd 1 and Root_2: mean 15 and sd 2.
par.mu <- rbind( c(0, 1), c(15, 2) )
## Using a lognormal prior on the standard deviations, with log(mean)=0 and log(sd)=1.5.
par.sd <- c(0, 1.5)
## Prior on the covariance matrix is set to the default:
## marginally uniform on the covariances.
norm.prior <- makePrior(r = 2, p = 1, den.mu = "norm", par.mu = par.mu, den.sd = "lnorm"
, par.sd = par.sd)
handle.norm.root <- ratematrixMCMC(data=dat, phy=phy, prior=norm.prior
, start="prior_sample", gen=200000
, outname="normal_root_prior", prop=c(0.1,0.45,0.45)
, dir=tempdir() )
## $heidel
## root matrix
## stest TRUE TRUE
## htest TRUE TRUE
##
## $ess
## root_1 root_2 matrix_cel_1 matrix_cel_2 matrix_cel_3 matrix_cel_4
## 165.2597 103.7021 393.1583 430.8257 430.8257 455.0331
The Heidel test suggests the single chain reached convergence. Now we can make plots of the prior and posterior distributions.
## Plotting the prior distribution.
## IMPORTANT NOTE: Pay attention to the scale of the rates when comparing the posterior and prior distribution plots!!.
Note how the true parameter value for root of trait_1 (red line) is
away from the prior distributions whereas the root value for trait_2 is.
Now compare the plot of the prior distribution with the plot for the
posterior. While doing this pay attention to the x axes of the two
plots, since the range is different. Both plotting functions have
parameters to control the range of the x axis (see
set.xlim
).
Also note that the true value for trait_1 is not within a high density of the prior whereas the true value for trait_2 is:
## Check the summary and quartiles for the prior distributions.
## Prior for trait 1:
summary( rnorm(1000, 0, 1) )
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.952361 -0.711554 -0.002966 0.007190 0.685949 3.220592
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.217 13.699 15.062 15.025 16.362 20.243
The true value of trait_1 is far from the prior distribution. For trait_2, the true value is within the first quartile of the prior. In this example the posterior distribution was updated in the direction of the true parameter value of the model, but the effect of the prior is strong and biased the results.
This was an extreme example to show the effect of an informative prior centered away from the “true” parameter values for the model. On the other hand, such an effect can improve the biological interpretations of results if a prior is used to input external information about the data or help avoid unreasonable parameter values.
While the vector of root values represents the starting point for the multivariate Brownian-motion process walking along the branches of the tree from the root to the tips, the evolutionary rate matrix (R) is the variance-covariance matrix of such process. Here we will explore setting priors for both the rates of evolution for each trait and their evolutionary covariances, which together compose the R matrix.
Results of studies on character correlations within or among populations and knowledge about the underlying genetic architecture of traits are examples of external information that can be used to set informative prior distributions on the structure of correlation among traits.
Because of the sampler scheme implemented in ratematrix
,
it is possible to set independent prior distributions to the structure
of evolutionary correlation among traits and the vector of evolutionary
rates for each trait. In this part of the tutorial we will set a
uninformative prior and compare the results with an informative prior.
For simplicity, we will use the same data set and trees of the previous
section.
First we will set a uninformative prior for the evolutionary correlation among traits. This analysis is identical to the first analysis conducted in this tutorial. So we will use the same results.
The uninformative prior on the evolutionary covariances is a marginally uniform prior. In other words, the correlation among each two traits in the data has a uniform distribution when integrated over the uncertainty in all other elements of the R matrix. We will make some plots to demonstrate this concept.
First plot the prior for the entire R matrix, which is a variance-covariance matrix.
## Plotting the prior distribution.
## IMPORTANT NOTE: Pay attention to the scale of the rates when comparing the posterior and prior distribution plots!!.
## Plotting a single regime.
Now we can decompose this distribution of variance-covariance
matrices in a distribution of correlation matrices and another
distribution of standard variations for each trait. For this, we will
use the function samplePrior
to take samples from the prior
distribution on the correlation structure and standard deviations
separately. One can also use this function to sample the
variance-covariance matrices instead of their different components.
ss <- samplePrior(n=1000, prior=unif.prior, sample.sd = TRUE, rebuild.R = FALSE)
corr <- sapply(ss$matrix, function(x) cov2cor(x)[1,2] )
hist( corr, main="Correlation between traits")
hist( ss$sd[,1], main="Standard deviation - trait 1")
hist( ss$sd[,2], main="Standard deviation - trait 2")
Note from these plots that the prior on the correlation is uniformly distributed between -1 and 1. On the other hand, the standard deviations show a log-normal distribution, just as we set it to be. It is difficult to see from the plot of the whole R matrix that the correlations are uniformly distributed, but this is made clearer by transforming the covariance into a correlation.
We already performed the analysis using this data and prior. So we will just plot the posterior distribution for the R matrix. Here note that the red lines show the R matrix used to generate the data.
plotRatematrix(chain=post.unif.root, set.leg = c("trait_1", "trait_2")
, point.matrix = list(R), point.color = "red", point.wd = 1.5)
## Plotting a single regime.
Note that the posterior distribution in this case is congruent with the true parameter value used to generated the data.
Here we will repeat the exercise of using a informative prior on the analysis, but now focusing on the structure of evolutionary correlation among traits.
Note that I used a log-normal prior on the standard deviations. Such prior puts more weight toward smaller values for the rates of evolution of the traits. This follows the notion that values for this parameter are usually not very large (and never negative). On the other hand, the marginally uniform prior applied to the correlation matrix sets a uniform prior on the correlation, so negative, positive and no correlation have the same weight (i.e., prior probability).
Maybe we think that this is not a reasonable prior distribution and that strong correlations, either negative or positive, should have less prior probability than weak correlations. For this we will set the prior to be centered on 0 correlation and with most of the density within the -0.5 and 0.5 interval.
Note that here we need to set the option ‘unif.corr’ to FALSE in order for the function to use the custom input values for the parameters ‘Sigma’ and ‘nu’.
## Create a matrix with the bounds for the uniform prior.
## Here will use the max and min of the tip data as bounds.
par.mu <- rbind( ceiling( range(dat[,1]) ), ceiling( range(dat[,2]) ) )
## Using a lognormal prior on the standard deviations, with log(mean)=0 and log(sd)=1.5.
par.sd <- c(0,1.5)
## Sigma is a scale matrix. This is equivalent to the mean of a inverse-Wishart.
## An identity matrix will center the distribution on 0 correlation.
Sigma <- rbind( c(1, 0),
c(0, 1) )
## nu is the equivalent of the variance of this distribution. But here large values mean
## most of the samples will be close to the Sigma matrix whereas small values
## (min = number of traits +1) will result in a more spread distribution.
nu <- 50 ## This will constrain the distribution around the Sigma matrix.
corr.prior <- makePrior(r = 2, p = 1, den.mu = "unif", par.mu = par.mu, den.sd = "lnorm"
, par.sd = par.sd, unif.corr = FALSE, Sigma = Sigma, nu = nu)
We can take samples of this prior and make independent plots for the distribution of correlations and standard deviations.
ss.corr <- samplePrior(n = 1000, prior=corr.prior, sample.sd = TRUE, rebuild.R = FALSE)
corr.p <- sapply(ss.corr$matrix, function(x) cov2cor(x)[1,2] )
hist( corr.p, main="Correlation between traits")
hist( ss.corr$sd[,1], main="Standard deviation - trait 1")
hist( ss.corr$sd[,2], main="Standard deviation - trait 2")
Notice how the distribution of standard deviations has not changed
even after modifying the prior on the correlation. This allows for great
flexibility for setting prior distributions when using the
ratematrix
package. Note that we successfully set the prior
on the correlations as intended (nice!).
The chosen prior is very informative and is does not span the true parameter value for the simulation. The true correlation value used in the simulations is:
## [1] 0.4
After sampling from the prior distribution using
samplePrior
we can apply the function
plotRatematrix
to make a plot of the distribution of
R matrices. The function plotratematrix
will rebuild the variance-covariance matrices from the joint
distribution of correlation matrices and standard deviations prior to
making the plot. This is possible because samplePrior
produces output in the same format as the readMCMC
function.
## Plotting a single regime.
We can compare this prior distribution with the prior distribution used in the previous analysis. It is possible to see how the covariance term is much more tightly distributed around 0.
Now we can set and run the analysis using this prior, then check convergence and plot the posterior distribution.
## handle.corr.zero <- ratematrixMCMC(data=dat, phy=phy, prior=corr.prior
## , start="prior_sample", gen=200000
## , outname="corr_zero_prior", prop=c(0.1,0.45,0.45)
## , dir=tempdir() )
## post.corr.zero <- readMCMC(handle=handle.corr.zero)
checkConvergence(post.corr.zero)
## $heidel
## root matrix
## stest TRUE TRUE
## htest TRUE TRUE
##
## $ess
## root_1 root_2 matrix_cel_1 matrix_cel_2 matrix_cel_3 matrix_cel_4
## 141.6506 112.1025 692.3728 734.7567 734.7567 689.4026
plotRatematrix(post.corr.zero, point.matrix = list(R), point.color = "red", point.wd = 1.5
, show.zero = TRUE)
## Plotting a single regime.
The result is similar to when we set an informative prior to the vector of root values. The posterior distribution for the evolutionary covariance among traits is not centered in the true parameter value whereas it was in the previous analysis with the marginally uniform prior. The bias is also predictable; the posterior distribution has shifted towards the zero value.
It is important to note that in empirical studies there is no such
thing as “true parameter values”. While it is straightforward to show
the impact of the prior distribution using simulations, it is much
harder with empirical data. For this reason, we recommend that users
actively explore different prior distributions. Also, a Maximum
Likelihood Estimate (MLE) can be performed with packages such as
mvMORPH
and phytools
. The MLE result should be
compared with the Bayesian posterior distribution. If the MLE is not
congruent with the MCMC analysis, some questions need to be answered
before moving forward; does the MLE reached a global solution? Is the
prior distribution away from the MLE? Have the MCMC really
converged?
“And a lean, silent figure slowly fades into the gathering darkness, aware at last that in this world, with great power there must also come – great responsibility!” (Stan Lee and Steve Ditko, 1962. Amazing Fantasy #15)