TL;DR
This paper introduces a theoretical framework for how the extended Burrows-Wheeler Transform (eBWT) clusters DNA sequence copies, enabling efficient, alignment-free SNP detection with higher precision and lower coverage requirements.
Contribution
The paper develops a novel theory predicting nucleotide clustering in eBWT and presents a new SNP discovery method leveraging this theory, improving accuracy and efficiency.
Findings
SNPs are effectively clustered in the eBWT of read collections.
The proposed method requires less coverage than existing tools.
Preliminary results show improved precision and sensitivity.
Abstract
In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in the eBWT of the reads collection, and we develop a tool finding SNPs with a simple scan of the eBWT and LCP arrays. Preliminary results show that our method requires much less coverage than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
