SplitStrains, a tool to identify and separate mixed Mycobacterium tuberculosis infections from WGS data
Einar Gabbassov, Miguel Moreno-Molina, I\~naki Comas, Maxwell, Libbrecht, Leonid Chindelevitch

TL;DR
SplitStrains is a new statistical tool that accurately detects and separates mixed M. tuberculosis infections from whole-genome sequencing data, improving strain identification and proportion estimation.
Contribution
It introduces a novel, statistically grounded method that outperforms existing approaches in identifying and quantifying mixed bacterial strains from WGS data.
Findings
Superior performance in strain proportion estimation on simulated data
Effective identification of underlying strains in real M. tuberculosis samples
Enhances analysis capabilities for bacterial mixed infections
Abstract
The occurrence of multiple strains of a bacterial pathogen such as M. tuberculosis or C. difficile within a single human host, referred to as a mixed infection, has important implications for both healthcare and public health. However, methods for detecting it, and especially determining the proportion and identities of the underlying strains, from WGS (whole-genome sequencing) data, have been limited. In this paper we introduce SplitStrains, a novel method for addressing these challenges. Grounded in a rigorous statistical model, SplitStrains not only demonstrates superior performance in proportion estimation to other existing methods on both simulated as well as real M. tuberculosis data, but also successfully determines the identity of the underlying strains. We conclude that SplitStrains is a powerful addition to the existing toolkit of analytical methods for data coming from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
