Measuring Fit of Sequence Data to Phylogenetic Model: Gain of Power using Marginal Tests
Peter J. Waddell, Rissa Ota, and David Penny

TL;DR
This paper evaluates how well phylogenetic models fit sequence data, demonstrating that marginalized tests reveal significant deviations from model assumptions, thus questioning the reliability of many current phylogenetic inferences.
Contribution
It introduces marginalized likelihood ratio tests that improve power to detect model misfit in phylogenetics, highlighting widespread deviations from model assumptions.
Findings
General tests do not reject data-model fit (p~0.5)
Marginalized tests strongly reject common phylogenetic models (p<0.001)
Sequences are not stationary in nucleotide composition
Abstract
Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (1978) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (1982) to the present. We compare the general log-likelihood ratio (the G or G2 statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (p~0.5), but the marginalized tests do. Tests on pair-wise frequency (F) matrices, strongly (p < 0.001) reject the most general phylogenetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Evolution and Paleontology Studies · Genetic diversity and population structure
