A new distance between DNA sequences
Viswanath.C.Narayanan

TL;DR
This paper introduces a versatile new distance metric for DNA sequences applicable to various evolutionary models, demonstrating improved accuracy in phylogenetic tree recovery through simulation experiments.
Contribution
It presents a novel, general distance metric for DNA sequences that works across multiple evolutionary models, including those with variable rates.
Findings
Outperforms existing metrics in phylogenetic tree reconstruction under classical models.
Performs equally well or better under models with varying substitution rates.
Applicable to a wide range of evolutionary Markov models.
Abstract
We propose a new distance metric for DNA sequences, which can be defined on any evolutionary Markov model with infinitesimal generator matrix Q. That is the new metric can be defined under existing models such as Jukes-Cantor model, Kimura-2-parameter model, F84 model, GTR model etc. Since our metric does not depend on the form of the generator matrix Q, it can be defined for very general models including those with varying nucleotide substitution rates among lineages. This makes our metric widely applicable. The simulation experiments carried out shows that the new metric, when defined under classical models such as the JC, F84 and Kimura-2-parameter models, performs better than these existing metrics in recovering phylogenetic trees from sequence data. Our simulation experiments also show that the new metric, under a model that allows varying nucleotide substitution rates among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Fractal and DNA sequence analysis · RNA and protein synthesis mechanisms
