A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-Generation Sequencing
Rachel S. Schwartz, Kelly Harkins, Anne C. Stone, and Reed A., Cartwright

TL;DR
This paper introduces SISRS, a new software tool that rapidly extracts phylogenetically informative data directly from next-generation sequencing reads without needing a reference genome or assembly, enabling efficient phylogenetic analysis.
Contribution
SISRS is a novel method that bypasses traditional genome assembly and annotation steps, providing a fast way to identify homologous loci for phylogenetics from raw sequencing data.
Findings
Successfully identified variable phylogenetic sites in simulated data
Produced accurate phylogenies for ape genomes
Reconstructed consistent mammalian phylogenies from multiple datasets
Abstract
We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, genome-genome alignment, and annotation. For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered phylogenies from multiple datasets that were consistent with previous conflicting estimates of the relationships among mammals. SISRS is open source and freely available at https://github.com/rachelss/SISRS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
