ClaDS manual
The ClaDS
module implements the Data Augmentation inference method for the ClaDS model, that allows estimating branch specific speciation rates on a reconstructed phylogeny.
Loading a tree
You can import a phylogeny to the environment using the load_tree
function. Currently supported extensions include .tre
and .nex
.
my_tree = load_tree(tree_path)
Running ClaDS
The parameter inference is ran with the function infer_ClaDS
output = infer_ClaDS(my_tree)
The keyword argument print_state
can be used to print the state of the run every print_state
iteration.
output = infer_ClaDS(my_tree, print_state = 100)
You can save the result with
using JLD2
@save the_path_you_want_to_save_the_result output
and load it back to a julia session with
using JLD2
@load the_path_you_want_to_save_the_result output
Incomplete sampling
By default, the function considers that the clade was perfectly sampled, i.e. that all the species alive at present time are included in the phylogeny. If it is not the case, the sampling fraction can be specified through the keyword argument f
. f
can be a Float
, in which case the sampling fraction is taken as homogeneous on the whole phylogeny.
output = infer_ClaDS(my_tree, f = 0.94)
Alternatively, different sampling fractions can be specified for different subclades. To do so, f
should be passed as a Array{Float64}
of length n
, where n
is the number of tip in the phylogeny. f[i]
is the sampling fraction of the subclade that contains tip i
. If the Tree
object has tip labels (which can be accessed using tip_labels(my_tree)
, the sampling fractions in f
are in the same order as the tip labels, and f[i]
is the sampling fraction of the subclade that contains the tip with label tip_labels(my_tree)[i]
.
In the following example, the left subtree of my_tree
is assigned the sampling fraction 0.3
and its right subtree the sampling fraction 0.8
.
#=
create a vector of size n_tips(my_tree) where the first
n_tips(my_tree.offsprings[1]) elements are equal to 0.3
and the rest to 0.8
=#
f = [ i < n_tips(my_tree.offsprings[1]) ? 0.3 : 0.8 for i in 1:n_tips(my_tree)]
output = infer_ClaDS(my_tree, f = f)
Result
The result is a CladsOutput
object, that contains the following fields:
tree
: theTree
object on which the inference was performed.chains
: the resulting mcmc chains.rtt_chains
: the mcmc chains of the mean rate through time.σ_map
,α_map
,ε_map
,λ0_map
: the MAP estimates of the model's parameters.λi_map
,λtip_map
: the MAP estimates of the branch-specific and present rates.time_points
: Time points at which the number of lineages through time and rate through time are computed. The number of time points can be specified using the keyword argumentltt_steps
.DTT_mean
: Estimate of the number of lineages through time.RTT_map
: Estimate of the mean rate through time.enhanced_tree
: Sample from the complete phylogeny distribution. Their number can be specified through the keyword argumentn_trees
.gelm
: Evaluation of the gelman statistics.
If the tree has tip labels, the tip rate for species sp_name
can be extracted using the function tip_rate
:
tip_rate(output, sp_name)
The result can be saved as a .Rdata
object with the function save_ClaDS_in_R
, so it can be manipulated in R.
Plot the branch specific rates
It can be plotted using the plot_CladsOutput
function. By default, this function plots the reconstructed phylogeny painted with the inferred branch-specific speciation rates, but other methods are available.
plot_CladsOutput(output)
Options for plotting the tree can be passed as a String
to the options
keyword argument. All the options of the R
function plot.phylo
from the ape
package are available.
plot_CladsOutput(output, options = "type = 'fan'")
plot_CladsOutput(output, options = "lwd = 1, direction = 'leftwards'")
Diversity through time plot
Using the keyword argument method = "DTT"
, the function plots the estimate of the number of lineages through time.
plot_CladsOutput(output, method = "DTT")
On this plot, we have:
- black line: the LTT plot (number of lineages through time in the reconstructed phylogeny)
- thin blue lines: individual MCMC iterations
- thick blue lines: the $95\%$ confidence interval
- dotted green line: the point estimates
Mean rate through time plot
Using the keyword argument method = "RTT"
, the function plots the estimate of the mean speciation rate through time.
plot_CladsOutput(output, method = "RTT")
Similarly to the diversity through time plot, we have here:
- thin blue lines: individual MCMC iterations
- thick blue lines: the $95\%$ confidence interval
- dotted green line: the point estimates
Marginal posterior densities
With the keyword argument method = "density"
, the functions plot the marginal posterior density of a given model parameter or a summary statistics. Similarly, method = "chain"
allows plotting the mcmc chains for this parameter.
plot_CladsOutput(output, method = "density")
plot_CladsOutput(output, method = "chain")
What parameter to plot is specified through the keyword id_par
, for both method = "density"
and method = "chain"
.
plot_CladsOutput(output, method = "density", id_par = "sigma")
plot_CladsOutput(output, method = "density", id_par = "σ")
plot_CladsOutput(output, method = "density", id_par = "alpha")
plot_CladsOutput(output, method = "density", id_par = "epsilon")
plot_CladsOutput(output, method = "density", id_par = "lambda0")
The branch specific speciation rate of branch i
is accessed with id_par = lambda_i
or id_par = λ_i
plot_CladsOutput(output, method = "density", id_par = "lambda_5")
plot_CladsOutput(output, method = "density", id_par = "λ_2")
The tip rate of tip i
is accessed with id_par = lambda_tip_i
or id_par = λtip_i
. Alternatively, if the tree has tip labels, the rate can be accessed using the species name with id_par = lambda_tip_spname
.
plot_CladsOutput(output, method = "density", id_par = "lambdatip_2")
plot_CladsOutput(output, method = "density", id_par = "λtip_3")
sp_name = tip_labels(my_tree)[10]
id_par = "λtip_$(sp_name)"
plot_CladsOutput(output, method = "density", id_par = id_par)
Finaly, the rate through time and diversity through time can be accessed with id_par = rate_time
and id_par = div_time
, where time
is an integer between 1
and length(output.time_points)
.
plot_CladsOutput(output, method = "density", id_par = "rate_12")
plot_CladsOutput(output, method = "density", id_par = "div_33")