Sequence assembly from corrupted shotgun reads
Shirshendu Ganguly, Elchanan Mossel, Miklos Z. Racz

TL;DR
This paper investigates the robustness of sequence assembly algorithms under high-error conditions, demonstrating that accurate reconstruction is feasible with sufficiently long and numerous reads, regardless of error types, as long as errors are bounded.
Contribution
It introduces a novel analysis showing approximate sequence reconstruction is possible from corrupted reads with bounded errors, without assuming specific error profiles.
Findings
Reconstruction is achievable with long enough reads and sufficient coverage.
The proposed algorithm guarantees an edit distance proportional to the error bound.
Robustness holds across various error types without specific assumptions.
Abstract
The prevalent technique for DNA sequencing consists of two main steps: shotgun sequencing, where many randomly located fragments, called reads, are extracted from the overall sequence, followed by an assembly algorithm that aims to reconstruct the original sequence. There are many different technologies that generate the reads: widely-used second-generation methods create short reads with low error rates, while emerging third-generation methods create long reads with high error rates. Both error rates and error profiles differ among methods, so reconstruction algorithms are often tailored to specific shotgun sequencing technologies. As these methods change over time, a fundamental question is whether there exist reconstruction algorithms which are robust, i.e., which perform well under a wide range of error distributions. Here we study this question of sequence assembly from corrupted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
