The multifurcating skyline plot

Abstract

A variety of methods based on coalescent theory have been developed to infer demographic history from gene sequences sampled from natural populations. The " skyline plot " and related approaches are commonly employed as flexible prior distributions for phylogenetic trees in the Bayesian analysis of pathogen gene sequences. In this work we extend the classic and generalised skyline plot methods to phylogenies that contain one or more multifurcations (i.e. hard polytomies). We use the theory of Lambda-coalescents (specifically, Beta(2-$α$, $α$)-coalescents) to develop the " multifurcating skyline plot " , which estimates a piecewise constant function of effective population size through time, conditional on a time-scaled multifurcating phylogeny. We implement a smoothing procedure and extend the method to serially-sampled (heterochronous) data, but we do not address here the problem of estimating trees with multifurcations from gene sequence alignments. We validate our estimator on simulated data using maximum likelihood and find that parameters of the Beta(2$α$, $α$)-coalescent process can be estimated accurately. Lastly we apply the multifurcating skyline plot to a molecular clock phylogeny of 1,610 Ebola virus sequences from the 2014-2016 West African outbreak. We artificially collapse short branches in this empirical phylogeny in order to mimic different levels of multifurcation and show that variance in the reproductive success of the pathogen through time can be estimated by combining the skyline plot with epidemiological case count data.

Publication
Virus Evolution