Partial DNA Assembly: A Rate-Distortion Perspective
Ilan Shomorony, Govinda M. Kamath, Fei Xia, Thomas A. Courtade, David, N. Tse

TL;DR
This paper introduces a rate-distortion framework for partial DNA assembly, defining a new distortion measure based on the number of Eulerian cycles in assembly graphs, and presents an algorithm with real genome analysis.
Contribution
It proposes a novel distortion function for assembly graphs and develops an algorithm for partial DNA assembly under this framework.
Findings
The distortion measure is the logarithm of Eulerian cycles in the assembly graph.
The algorithm effectively constructs assembly graphs from real genome data.
The approach provides a new perspective on handling ambiguous DNA assemblies.
Abstract
Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is very often the case that the read data is not sufficiently rich to permit unambiguous reconstruction of the original sequence. While a natural generalization of the perfect assembly formulation to these cases would be to consider a rate-distortion framework, partial assemblies are usually represented in terms of an assembly graph, making the definition of a distortion measure challenging. In this work, we introduce a distortion function for assembly graphs that can be understood as the logarithm of the number of Eulerian cycles in the assembly graph, each of which correspond to a candidate assembly that could have generated the observed reads. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
