The do3PCA package is able to do standard (non-phylogenetic) probabilistic PCA, probabilistic phylogenetic PCA (3PCA), and probabilistic PCA of phylogenetic independent contrasts (PIC-PCA). Here we generate a dataset using simulation tools and show how to perform probabilistic PCA analyses, interpret the results, and create plots.
We will perform regular PCA and probabilistic PCA and compare the
results using the iris data. First the regular PCA:
data(iris)
iris_mat <- iris[,1:4]
pca <- prcomp(iris_mat)
# Use the Broken Stick function to select the number of principal components to retain:
BrokenStick(n_dim = 4, var_vec = pca$sdev)## $broken_stick
## [1] 0.5208333 0.2708333 0.1458333 0.0625000
##
## $n_pc_to_keep
## [1] 1
Now let’s repeat the analysis with probabilistic PCA. The difference
is that we need to set the number of dimensions to keep because it is
part of the model. So here we will perform model selection using a
probabilistic approach as implemented in the modelChoice
function.
## 3
## 3
## 1 2 3
## 360.0199 417.7342 439.5163
Note that the likelihood method selected 3 principal components to
keep whereas the Broken Stick selected only a single PC. This is not an
error! These two approaches are both valid, but they try to do different
things. Let’s see what the Broken Stick method return when we use the
probabilistic PCA. For that we need to estimate the model with the
largest number of PCs, which is the total number of dimensions of the
data -1. For that we will use the function ProbPCA that
does non-phylogenetic probabilistic PCA:
full_model <- ProbPCA(x = as.matrix(iris_mat), ret_dim = 3)
BrokenStick(n_dim = 3, var_vec = full_model$percent_var_pcs)## $broken_stick
## [1] 0.6111111 0.2777778 0.1111111
##
## $n_pc_to_keep
## [1] 1
We find that the Broken Stick approach also recommends a single principal component with the results of the probabilistic PCA.
Now let’s plot the results of the probabilistic PCA using a biPlot:
The data and labels can be non-optimal, so we can adjust things. We can also change the colors of the plot elements:
doBiplot(x = full_model, xlim = c(-0.3, 0.3), ylim = c(-0.3, 0.3), col = c("lightblue", "darkgreen", "pink"))Here we will use data from the ratematrix package to
perform a quick phylogenetic PCA. The steps are very similar to the one
we did with regular PCA. Here we will also compare the scores of the
do3PCA package with the results using phytools
and the non-probabilistic phylogenetic PCA.
library( ratematrix )
data("anoles")
anole_dt <- anoles$data[,1:3]
anole_phy <- mergeSimmap(anoles$phy[[1]], drop.regimes = TRUE)First let’s do a standard phylogenetic PCA:
## Loading required package: ape
## Loading required package: maps
phypca <- phyl.pca(tree = anole_phy, Y = anole_dt)
# Check the best number of PCs using the Broken Stick function:
BrokenStick(n_dim = 3, var_vec = diag(phypca$Eval))## $broken_stick
## [1] 0.6111111 0.2777778 0.1111111
##
## $n_pc_to_keep
## [1] 1
# Again only a single dimension.
# We can plot the scores of the PCA using the do3PCA plotting function:
doBiplot(x = phypca)Now let’s repeat this analysis using the 3PCA approach and the
phylProbPCA function. We will set the option
scale = FALSE because phytools does not scale
the data prior to conducting a PCA and do3PCA package
does.
## Error in `phylProbPCA()`:
## ! x needs to be a matrix. Cannot be a data.frame.
Oh! We got an error! That is because the x parameter for
the do3PCA package needs to be a matrix type object. Let’s
use as.matrix to fix this:
pca3 <- phylProbPCA(phy = anole_phy, x = as.matrix(anole_dt), scale = FALSE, ret_dim = 2)
doBiplot(x = pca3)The results are nearly identical!
However, note that PCA rotates the data to fit the principal axes.
The direction of this rotation (e.g., if to the right or left of PC1) is
arbitrary and can change depending on the implementation. Do some
research about princomp and prcomp functions
and you will learn that they can return inverted PC axes. Both are
correct results.
Here we show that the scores are very similar:
## [,1] [,2]
## ahli 0.2854764 0.2091089
## alayoni 0.7666921 0.3264510
## alfaroi 0.8501569 -0.2343907
## aliniger 0.2554546 0.1911126
## allisoni -0.5639831 0.1245916
## allogus 0.2275666 0.1199435
## Plot the 3PCA scores with x axis inverted
plot(x = pca3$scores[,1], y = pca3$scores[,2])
## Add the phytools pPCA as red + symbols
points(x = phypca$S[,1], y = phypca$S[,2], pch = 3, col = "red")Doing a PIC-PCA is very similar to a 3PCA. We just need to use the
function picProbPCA intead of the phylProbPCA
function. The input objects and options are the same.
pic_pca <- picProbPCA(phy = anole_phy, x = as.matrix(anole_dt), scale = FALSE, ret_dim = 2)
doBiplot(x = pic_pca)