# Identifiability of phylogenetic parameters from k-mer data under the   coalescent

**Authors:** Chris Durden, Seth Sullivant

arXiv: 1705.06993 · 2017-05-22

## TL;DR

This paper establishes the theoretical identifiability of phylogenetic tree parameters from k-mer frequency data under the multispecies coalescent model, enabling more accurate species tree reconstruction without sequence alignment.

## Contribution

It derives model-based formulas and proves identifiability of tree topology and branch lengths from k-mer data under the coalescent, a novel theoretical advance.

## Key findings

- Identifiability of tree and branch lengths established
- Model-based formulas for divergence times derived
- Theoretical foundation for alignment-free phylogenetics provided

## Abstract

Distances between sequences based on their $k$-mer frequency counts can be used to reconstruct phylogenies without first computing a sequence alignment. Past work has shown that effective use of k-mer methods depends on 1) model-based corrections to distances based on $k$-mers and 2) breaking long sequences into blocks to obtain repeated trials from the sequence-generating process. Good performance of such methods is based on having many high-quality blocks with many homologous sites, which can be problematic to guarantee a priori.   Nature provides natural blocks of sequences into homologous regions---namely, the genes. However, directly using past work in this setting is problematic because of possible discordance between different gene trees and the underlying species tree. Using the multispecies coalescent model as a basis, we derive model-based moment formulas that involve the divergence times and the coalescent parameters. From this setting, we prove identifiability results for the tree and branch length parameters under the Jukes-Cantor model of sequence mutations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.06993/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1705.06993/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1705.06993/full.md

---
Source: https://tomesphere.com/paper/1705.06993