| Title: | Probabilistic Phylogenetic Principal Component Analysis |
|---|---|
| Description: | Estimates probabilistic phylogenetic Principal Component Analysis (PCA) and non-phylogenetic probabilistic PCA. Provides methods to implement alternative models of trait evolution including Brownian motion (BM), Ornstein-Uhlenbeck (OU), Early Burst (EB), and Pagel's lambda. Also provides flexible biplot functions. |
| Authors: | Daniel Caetano [aut, cre] (ORCID: <https://orcid.org/0000-0002-2803-4064>) |
| Maintainer: | Daniel Caetano <[email protected]> |
| License: | GPL (>= 2.0) |
| Version: | 1.0.2 |
| Built: | 2026-05-07 08:46:39 UTC |
| Source: | https://github.com/caetanods/do3pca |
Function generates the expected distribution of variance across the principal components and compares it with the observed vector of explained variance (if provided).
BrokenStick(n_dim, var_vec = NULL, tol = 0, plot = TRUE)BrokenStick(n_dim, var_vec = NULL, tol = 0, plot = TRUE)
n_dim |
the number of variables of the data (required). |
var_vec |
the vector of explained variance of the PCA (optional). |
tol |
a tolerance value that is used to check if the observed explained variance is larger than the expected under the Broken Stick model. |
plot |
if a plot should be produced. |
Function will return the vector of explained variance for each PC. Will also return the number of PCs to keep and optionally a plot of the results if 'var_vec' is provided.
Function to make biplots for any kind of PCA. It accepts the outputs from standard PCA (princomp and prcomp). It also works with the "phylProbPCA" and "ProbPCA" functions. It provides more options to the plot than the standard "stats::biplot".
doBiplot(x, choices = 1L:2L, scale = 1, pc.biplot = FALSE, col, ...)doBiplot(x, choices = 1L:2L, scale = 1, pc.biplot = FALSE, col, ...)
x |
output from PCA analysis. |
choices |
numeric vector of length 2. Use to choose which of the PC axes to plot. Default plots first and second axes: "choices = c(1,2)". |
scale |
numeric value between 0 and 1. Same as in "stats::biplot.princomp". See ?biplot.princomp for more information. |
pc.biplot |
logical. If TRUE it will produce a "principal component biplot" (sensu Gabriel, 1971). Same as in "stats::biplot.princomp". See ?biplot.princomp for more information. |
col |
character vector of length 3 with the colors of the biplot. First color is used for the score points (or sample sames), second color for arrows and variable names, and third color for the right and top-side ticks (plot axes). |
... |
extra parameters for the function. Same as "stats::biplot". |
Function has the same options as "stats::biplot", with the addition of the following arguments. "plot_dimnames" controls is the names of the samples (species) will be plotted. "add_points" controls if the score points will be plotted. "add_margin" is a numeric value that expands the area of the plot. You can use this to make sure the names of variables and samples (species) fit the plot.
makes a biplot of the PCA results.
phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ## Using probabilistic phylogenetic PCA phylppca <- phylProbPCA(phy = phy, x = dt, ret_dim = 2) doBiplot(x = phylppca, add_margin = 0.3) ## Using standard phylogenetic PCA phylpca <- phytools::phyl.pca(tree = phy, Y = dt) doBiplot(x = phylpca, add_margin = 0.3) ## Using probabilistic PCA ppca <- ProbPCA(x = dt) doBiplot(x = ppca, add_margin = 0.3) ## Using standard PCA pca1 <- princomp(x = dt) doBiplot(x = pca1, add_margin = 0.1) ## Using standard PCA pca2 <- prcomp(x = dt) doBiplot(x = pca2, add_margin = 0.1)phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ## Using probabilistic phylogenetic PCA phylppca <- phylProbPCA(phy = phy, x = dt, ret_dim = 2) doBiplot(x = phylppca, add_margin = 0.3) ## Using standard phylogenetic PCA phylpca <- phytools::phyl.pca(tree = phy, Y = dt) doBiplot(x = phylpca, add_margin = 0.3) ## Using probabilistic PCA ppca <- ProbPCA(x = dt) doBiplot(x = ppca, add_margin = 0.3) ## Using standard PCA pca1 <- princomp(x = dt) doBiplot(x = pca1, add_margin = 0.1) ## Using standard PCA pca2 <- prcomp(x = dt) doBiplot(x = pca2, add_margin = 0.1)
Likelihood function for the standard probabilistic PCA or the phylogenetic probabilistic PCA. Note that one just need to change between the covariance of the data and the R matrix (multivariate Brownian Motion covariance matrix). The likelihood function only uses the eigenvalues of the covariance matrix (see parameter g).
loglik(L, sigma, N, d, g)loglik(L, sigma, N, d, g)
L |
vector with length equal to the number of retained dimensions. |
sigma |
variance. This is the average variance lost per discarded dimension. |
N |
sample size. Number of observations. |
d |
integer. Number of dimensions in the data. |
g |
eigenvalues of the covariance matrix. |
returns the -1 * log-likelihood of the data.
Revell, L. J. 2009. Size-Correction and Principal Components for Interspecific Comparative Studies. Evolution 63:3258–3268. doi: 10.1111/j.1558-5646.2009.00804.x
Revell, L. J. 2024. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 12:e16505. doi: 10.7717/peerj.16505
Tipping, M. E., and C. M. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 61(3):611–622. doi: 10.1111/1467-9868.00196
Zhang, A., Chan, K.L., Kwok, J.T., and D-Y Yeung. 2004. Proceedings of the AAAI Conference on Artificial Intelligence 19 Book One, Volume. https://aaai.org/papers/00372-aaai04-060-bayesian-inference-on-principal-component-analysis-using-reversible-jump-markov-chain-monte-carlo/
phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) L_start <- rgamma(n = 2, shape = 2, rate = 1) sigma_start <- rgamma(n = 1, shape = 2, rate = 1) R <- do3PCA:::fastR(x = dt, phy = phy) g <- eigen(R)$values loglik(L = L_start, sigma = sigma_start, N = nrow(dt), d = 2, g = g)phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) L_start <- rgamma(n = 2, shape = 2, rate = 1) sigma_start <- rgamma(n = 1, shape = 2, rate = 1) R <- do3PCA:::fastR(x = dt, phy = phy) g <- eigen(R)$values loglik(L = L_start, sigma = sigma_start, N = nrow(dt), d = 2, g = g)
Function to perform probabilistic phylogenetic PCA using Bayesian Markov-Chain Monte Carlo.
MCMCphylProbPCA( phy, x, ret_dim = 2, scale = TRUE, fixed_R = NULL, gamma_shape = 1, gamma_rate = 2, gen = 10^6, step_scale = 0.08, burn = 0.2, post_score_samples = 100 )MCMCphylProbPCA( phy, x, ret_dim = 2, scale = TRUE, fixed_R = NULL, gamma_shape = 1, gamma_rate = 2, gen = 10^6, step_scale = 0.08, burn = 0.2, post_score_samples = 100 )
phy |
phylogeny in "phylo" format. |
x |
a matrix with traits in columns and species values in rows. Rownames must match the tip labels of phylogeny. |
ret_dim |
number of dimensions (PC axes) to be kept by the model. |
scale |
if data should be rescaled to have zero mean and unit variance using the function scale(). Default is TRUE. |
fixed_R |
an evolutionary rate matrix to be used for the computation. |
gamma_shape |
The shape parameter for the gamma prior. |
gamma_rate |
The rate parameter for the gamma prior. |
gen |
The number generations for the MCMC. |
step_scale |
The scaler for the mulplier proposal of the MCMC. |
burn |
The proportion of the burn-in to the excluded before computing the average posterior parameter value. |
post_score_samples |
The number of posterior samples used to compute the posterior of scores. |
This function uses the "mcmc" R package to conduct a Metropolis-Hastings search. The output "mcmc" can be used for further analyses of the MCMC chain using the "mcmc" package.
returns a list with "mean_scores", "percent_var_pcs", "var_loadings", and "stats". Also return "mcmc" which is the output from the "mcmc" R package.
Revell, L. J. 2009. Size-Correction and Principal Components for Interspecific Comparative Studies. Evolution 63:3258–3268. doi: 10.1111/j.1558-5646.2009.00804.x
Revell, L. J. 2024. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 12:e16505. doi: 10.7717/peerj.16505
Tipping, M. E., and C. M. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 61(3):611–622. doi: 10.1111/1467-9868.00196
Zhang, A., Chan, K.L., Kwok, J.T., and D-Y Yeung. 2004. Proceedings of the AAAI Conference on Artificial Intelligence 19 Book One, Volume. https://aaai.org/papers/00372-aaai04-060-bayesian-inference-on-principal-component-analysis-using-reversible-jump-markov-chain-monte-carlo/
phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ## Do a VERY small chain to test: mcmc3PCA <- MCMCphylProbPCA(phy = phy, x = dt, gen = 500)phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ## Do a VERY small chain to test: mcmc3PCA <- MCMCphylProbPCA(phy = phy, x = dt, gen = 500)
Function to perform probabilistic PCA using Bayesian Markov-Chain Monte Carlo.
MCMCProbPCA( x, ret_dim = 2, scale = TRUE, gamma_shape = 1, gamma_rate = 2, gen = 10^6, step_scale = 0.08, burn = 0.2 )MCMCProbPCA( x, ret_dim = 2, scale = TRUE, gamma_shape = 1, gamma_rate = 2, gen = 10^6, step_scale = 0.08, burn = 0.2 )
x |
a matrix with traits in columns and species values in rows. |
ret_dim |
number of dimensions (PC axes) to be kept by the model. |
scale |
if data should be rescaled to have zero mean and unit variance using the function scale(). Default is TRUE. |
gamma_shape |
The shape parameter for the gamma prior. |
gamma_rate |
The rate parameter for the gamma prior. |
gen |
The number generations for the MCMC. |
step_scale |
The scaler for the mulplier proposal of the MCMC. |
burn |
The proportion of the burn-in to the excluded before computing the average posterior parameter value. |
This function uses the "mcmc" R package to conduct a Metropolis-Hastings search. The output "mcmc" can be used for further analyses of the MCMC chain using the "mcmc" package.
returns a list with "mean_scores", "percent_var_pcs", "var_loadings", and "stats". Also return "mcmc" which is the output from the "mcmc" R package.
Revell, L. J. 2009. Size-Correction and Principal Components for Interspecific Comparative Studies. Evolution 63:3258–3268. doi: 10.1111/j.1558-5646.2009.00804.x
Revell, L. J. 2024. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 12:e16505. doi: 10.7717/peerj.16505
Tipping, M. E., and C. M. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 61(3):611–622. doi: 10.1111/1467-9868.00196
Zhang, A., Chan, K.L., Kwok, J.T., and D-Y Yeung. 2004. Proceedings of the AAAI Conference on Artificial Intelligence 19 Book One, Volume. https://aaai.org/papers/00372-aaai04-060-bayesian-inference-on-principal-component-analysis-using-reversible-jump-markov-chain-monte-carlo/
dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ## Do a VERY small chain to test: mcmcPPCA <- MCMCProbPCA(x = dt, gen = 100)dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ## Do a VERY small chain to test: mcmcPPCA <- MCMCProbPCA(x = dt, gen = 100)
Automatic choice of number of principal components
modelChoice(x = NULL, phy = NULL, cov = NULL, N = NULL)modelChoice(x = NULL, phy = NULL, cov = NULL, N = NULL)
x |
data. Columns are variables/features. Rows are observations/species. |
phy |
phylogenetic tree. Number of tips must match number of rows in x. |
cov |
a covariance matrix. Either the covariance of the data or an evolutionary ratematrix (R). |
N |
number of observations/species in the tree. |
a list with the best fitting number of principal components (k) and a vector with the log probability of the data given the number of principal components (k).
Function to perform probabilistic phylogenetic PCA. Allows for fit of alternative models of trait evolution using branch length transformation.
phylProbPCA(phy, x, ret_dim = 2, model = "BM", scale = TRUE, quiet = FALSE)phylProbPCA(phy, x, ret_dim = 2, model = "BM", scale = TRUE, quiet = FALSE)
phy |
phylogeny in "phylo" format. |
x |
a matrix with traits in columns and species values in rows. Rownames must match the tip labels of phylogeny. |
ret_dim |
number of dimensions (PC axes) to be kept by the model. |
model |
choice of model of trait evolution. One of "BM", "lambda", "OU", or "EB". |
scale |
if data should be rescaled to have zero mean and unit variance using the function scale(). Default is TRUE. |
quiet |
if function should suppress output to the console while running |
The function can be used to estimate the probabilistic phylogenetic PCA (3PCA) using distinct models of trait evolution. Models are implemented using branch length transformation. Model fitting happens in two steps. First the maximum likelihood of the evolutionary covariance matrix (R) and the parameter of the model is estimated. Then the 3PCA model is estimated using the phylogenetic tree with branch lengths transformed following the MLE for the parameter of each trait evolution model.
The function returns a list with the following elements. scores: the scores of the principal components; e_values: eigenvalues; e_vectors: eigenvectors or the projection; model_fit: information about the trait evolution model; loadings: the loadings of the PCs; varnames: the names of the variables; sig: the MLE of the error; mle.W: the MLE of the W matrix; Function also returns AIC, AICc, and BIC for the model.
returns a list of class "phylPPCA". See "Details" for more information.
Revell, L. J. 2009. Size-Correction and Principal Components for Interspecific Comparative Studies. Evolution 63:3258–3268. doi: 10.1111/j.1558-5646.2009.00804.x
Revell, L. J. 2024. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 12:e16505. doi: 10.7717/peerj.16505
Tipping, M. E., and C. M. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 61(3):611–622. doi: 10.1111/1467-9868.00196
phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ppca <- phylProbPCA(phy = phy, x = dt, ret_dim = 2) doBiplot(x = ppca, add_margin = 0.3)phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ppca <- phylProbPCA(phy = phy, x = dt, ret_dim = 2) doBiplot(x = ppca, add_margin = 0.3)
Function to perform probabilistic phylogenetic PCA. Allows for fit of alternative models of trait evolution using branch length transformation.
phylProbPCAJoint( phy, x, ret_dim = 2, model = "BM", scale = TRUE, quiet = FALSE )phylProbPCAJoint( phy, x, ret_dim = 2, model = "BM", scale = TRUE, quiet = FALSE )
phy |
phylogeny in "phylo" format. |
x |
a matrix with traits in columns and species values in rows. Rownames must match the tip labels of phylogeny. |
ret_dim |
number of dimensions (PC axes) to be kept by the model. |
model |
choice of model of trait evolution. One of "BM", "lambda", "OU", or "EB". |
scale |
if data should be rescaled to have zero mean and unit variance using the function scale(). Default is TRUE. |
quiet |
if function should suppress output to the console while running |
The function can be used to estimate the probabilistic phylogenetic PCA (3PCA) using distinct models of trait evolution. Models are implemented using branch length transformation. Model fitting happens in two steps. First the maximum likelihood of the evolutionary covariance matrix (R) and the parameter of the model is estimated. Then the 3PCA model is estimated using the phylogenetic tree with branch lengths transformed following the MLE for the parameter of each trait evolution model.
The function returns a list with the following elements. scores: the scores of the principal components; e_values: eigenvalues; e_vectors: eigenvectors or the projection; model_fit: information about the trait evolution model; loadings: the loadings of the PCs; varnames: the names of the variables; sig: the MLE of the error; mle.W: the MLE of the W matrix; Function also returns AIC, AICc, and BIC for the model.
returns a list of class "phylPPCA". See "Details" for more information.
Revell, L. J. 2009. Size-Correction and Principal Components for Interspecific Comparative Studies. Evolution 63:3258–3268. doi: 10.1111/j.1558-5646.2009.00804.x
Revell, L. J. 2024. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 12:e16505. doi: 10.7717/peerj.16505
Tipping, M. E., and C. M. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 61(3):611–622. doi: 10.1111/1467-9868.00196
phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ppca <- phylProbPCA(phy = phy, x = dt, ret_dim = 2) doBiplot(x = ppca, add_margin = 0.3)phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ppca <- phylProbPCA(phy = phy, x = dt, ret_dim = 2) doBiplot(x = ppca, add_margin = 0.3)
Function to perform probabilistic PCA of independent contrasts. Allows for fit of alternative models of trait evolution using branch length transformation.
picProbPCA(phy, x, ret_dim = 2, model = "BM", scale = TRUE, quiet = FALSE)picProbPCA(phy, x, ret_dim = 2, model = "BM", scale = TRUE, quiet = FALSE)
phy |
phylogeny in "phylo" format. |
x |
a matrix with traits in columns and species values in rows. Rownames must match the tip labels of phylogeny. |
ret_dim |
number of dimensions (PC axes) to be kept by the model. |
model |
choice of model of trait evolution. One of "BM", "lambda", "OU", or "EB". |
scale |
if data should be rescaled to have zero mean and unit variance using the function scale(). Default is TRUE. |
quiet |
if function should suppress output to the console while running |
The function can be used to estimate PIC-PCA using distinct models of trait evolution. Models are implemented using branch length transformation. Model fitting happens in two steps. First the maximum likelihood of the evolutionary covariance matrix (R) and the parameter of the model is estimated. Then the PIC-PCA model is estimated using the phylogenetic tree with branch lengths transformed following the MLE for the parameter of each trait evolution model.
The function returns a list with the following elements. scores: the scores of the principal components; e_values: eigenvalues; e_vectors: eigenvectors or the projection; model_fit: information about the trait evolution model; loadings: the loadings of the PCs; varnames: the names of the variables; sig: the MLE of the error; mle.W: the MLE of the W matrix; Function also returns AIC, AICc, and BIC for the model.
returns a list of class "phylPPCA". See "Details" for more information.
Revell, L. J. 2009. Size-Correction and Principal Components for Interspecific Comparative Studies. Evolution 63:3258–3268. doi: 10.1111/j.1558-5646.2009.00804.x
Revell, L. J. 2024. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 12:e16505. doi: 10.7717/peerj.16505
Garland Jr, T., Harvey, P.H. and Ives, A.R., 1992. Procedures for the analysis of comparative data using phylogenetically independent contrasts. Systematic biology, 41(1), pp.18-32.
Tipping, M. E., and C. M. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 61(3):611–622. doi: 10.1111/1467-9868.00196
phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ppca <- picProbPCA(phy = phy, x = dt, ret_dim = 2) doBiplot(x = ppca, add_margin = 0.3)phy <- ratematrix::anoles$phy[[1]] dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ppca <- picProbPCA(phy = phy, x = dt, ret_dim = 2) doBiplot(x = ppca, add_margin = 0.3)
Function to perform (non-phylogenetic) probabilistic PCA.
ProbPCA(x, ret_dim = 2, scale = TRUE)ProbPCA(x, ret_dim = 2, scale = TRUE)
x |
a matrix with traits in columns and observations in rows. |
ret_dim |
number of dimensions (PC axes) to be kept by the model. |
scale |
if data should be rescaled to have zero mean and unit variance using the function scale(). Default is TRUE. |
Optimization is performed using nloptr.
returns a list.
Tipping, M. E., and C. M. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 61(3):611–622. doi: 10.1111/1467-9868.00196
dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ppca <- ProbPCA(x = dt, ret_dim = 2)dt <- as.matrix( ratematrix::anoles$data[,1:3] ) ppca <- ProbPCA(x = dt, ret_dim = 2)
Function will compute the parameter values and the AIC for a model of trait evolution. Models are estimated using a single parameter to model all traits in the multivariate data (x). The models are fit with the product of the likelihood of each independent trait fitted to a tree with the branch lengths transformed under the model M and a single model parameter p. The model parameter p represents an average of the process across all traits. The parameters for the models are: 1) lambda for Pagel's lambda, 2) alpha for OU (with theta as the root value), 3) b for the EB, and 4) sigma for Brownian motion. Please note that fitting BM does not require transformation of branch lengths.
traitModelAIC(x, phy, model)traitModelAIC(x, phy, model)
x |
a dataset with traits in columns and species on the rows. Please make sure that dataset matches the phylogeny. |
phy |
a phylogenetic tree with the same species as in x. |
model |
the name of a model of trait evolution. The options are "BM", "lambda", "OU", or "EB". |
a list with the elements: model (the model name), lik (the log-likelihood), AIC (the Akaike Information Criterion), log_par (the log of the parameter value), nlopt_out (the output object from nloptr optimizer). If the model is "BM", then the output has R (the evolutionary R matrix).