Concatenated Code Design for Constrained DNA Data Storage with Asymmetric Errors
Yixin Wang, Li Deng, Md. Noor-A-Rahim, Erry Gunawan, Yong L. Guan, Zhi, P. Shi, Chueh L. Poh

TL;DR
This paper introduces a hybrid coding scheme for DNA data storage that combines constrained codes and LDPC codes to effectively handle homopolymer limits and asymmetric errors, improving error correction and storage efficiency.
Contribution
It presents a novel concatenated coding architecture tailored for DNA storage, addressing homopolymer constraints and asymmetric errors simultaneously, with detailed decoding and capacity analysis.
Findings
Achieves high coding potential of approximately 1.98 bits per nucleotide.
Demonstrates good bit-error rate performance through simulations and analysis.
Analyzes two sequencing techniques, Nanopore and Illumina, with their channel models.
Abstract
DNA Data storage has recently attracted much attention due to its durable preservation and extremely high information density (bits per gram) properties. In this work, we propose a hybrid coding strategy comprising of generalized constrained codes to tackle homopolymer (run-length) limit and a protograph based low-density parity-check (LDPC) code to correct asymmetric nucleotide level (i.e., A/T/C/G) substitution errors that may occur in the process of DNA sequencing. Two sequencing techniques namely, Nanopore sequencer and Illumina sequencer with their equivalent channel models and capacities are analyzed. A coding architecture is proposed to potentially eliminate the catastrophic errors caused by the error-propagation in the constrained decoding while enabling high coding potential. We also show the log likelihood ratio (LLR) calculation method for the belief propagation decoding with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Error Correcting Code Techniques · Algorithms and Data Compression
