Simulation and Inference

This is a simple example of simulation, inference, and prediction with the aphylo R package.

Setup

library(aphylo)
## Loading required package: ape
# Parameter estimates
psi  <- c(.05, .025)
mu_d <- c(.2, .1)
mu_s <- c(.1, .05)
Pi   <- .5

Simulation

set.seed(223)
x <- raphylo(n = 200, psi = psi, mu_d = mu_d, mu_s = mu_s, Pi = Pi)
plot(x)

The simulation function generates an aphylo class object which is simply a wrapper containing:

  • a 0/1 integer matrix (annotations),
  • a phylo tree (from the ape package), and
  • some information about the tree and annotations.

If needed, we can export the data as follows:

# Edgelist describing parent->offspring relations
write.csv(x$tree, file = "tree.tree", row.names = FALSE)

# Tip annotations
ann <- with(x, rbind(tip.annotation, node.annotation))
write.csv(ann, file = "annotations.csv", row.names = FALSE)

# Event types
events <- with(x, cbind(c(tip.type*NA, node.type)))
rownames(events) <- 1:nrow(events)
write.csv(events, file = "events.csv", row.names = FALSE)

Inference

To fit the data, we can use MCMC as follows:

ans <- aphylo_mcmc(x ~ psi + mu_d + mu_s + Pi)
## Warning: While using multiple chains, a single initial point has been passed
## via `initial`: c(0.1, 0.05, 0.9, 0.5, 0.1, 0.05, 0.5). The values will be
## recycled. Ideally you would want to start each chain from different locations.
## Convergence has been reached with 10000 steps. Gelman-Rubin's R: 1.0251. (500 final count of samples).
ans
## 
## ESTIMATION OF ANNOTATED PHYLOGENETIC TREE
## 
##  Call: aphylo_mcmc(model = x ~ psi + mu_d + mu_s + Pi)
##  LogLik: -108.8457
##  Method used: mcmc (10000 steps)
##  # of Leafs: 200
##  # of Functions 1
##  # of Trees: 1
## 
##          Estimate  Std. Err.
##  psi0    0.0664    0.0517
##  psi1    0.0593    0.0368
##  mu_d0   0.2254    0.1273
##  mu_d1   0.1326    0.0985
##  mu_s0   0.1300    0.0447
##  mu_s1   0.0594    0.0310
##  Pi      0.3547    0.2484

For goodness-of-fit analysis, we have a couple of tools. We can compare the predicted values with the observed values:

plot(ans)

We can also take a look at the surface of the posterior function

plot_logLik(ans)

And we can also take a look at the prediction scores

ps <- prediction_score(ans)
ps       # Printing
## Prediction score (H0: Observed = Random)
## 
##  N obs.      : 399
##  alpha(0, 1) : 0.41, 0.59
##  Observed    : 0.71 ***
##  Random      : 0.52 
##  P(<t)       : 0.0000
## --------------------------------------------------------------------------------
## Values scaled to range between 0 and 1, 1 being best.
## 
## Significance levels: *** p < .01, ** p < .05, * p < .10
## AUC 0.85.
## MAE 0.29.
plot(ps) # and plotting