Coding Over Coupon Collector Channels for Combinatorial Motif-Based DNA Storage
Roman Sokolovskii, Parv Agarwal, Luis Alberto Croquevielle, Zijian, Zhou, Thomas Heinis

TL;DR
This paper introduces new channel models and coding schemes for DNA storage using motif combinations, analyzing their limits and proposing methods to optimize capacity and decoding complexity.
Contribution
It presents two novel channel models for motif-based DNA storage and a coding scheme that approaches theoretical limits, improving upon prior methods.
Findings
Proposed channel models with and without interference.
Coding scheme approaches channel capacity limits.
Mitigation strategy for exponential decoding complexity.
Abstract
Encoding information in combinations of pre-synthesised deoxyribonucleic acid (DNA) strands (referred to as motifs) is an interesting approach to DNA storage that could potentially circumvent the prohibitive costs of nucleotide-by-nucleotide DNA synthesis. Based on our analysis of an empirical data set from HelixWorks, we propose two channel models for this setup (with and without interference) and analyse their fundamental limits. We propose a coding scheme that approaches those limits by leveraging all information available at the output of the channel, in contrast to earlier schemes developed for a similar setup by Preuss et al. We highlight an important connection between channel capacity curves and the fundamental trade-off between synthesis (writing) and sequencing (reading), and offer a way to mitigate an exponential growth in decoding complexity with the size of the motif…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Algorithms and Data Compression · Advanced biosensing and bioanalysis techniques
