Iterative DNA Coding Scheme With GC Balance and Run-Length Constraints Using a Greedy Algorithm
Seong-Joon Park, Yongwoo Lee, and Jong-Seon No

TL;DR
This paper introduces an iterative greedy algorithm for DNA data encoding that ensures GC balance and run-length constraints, improving error robustness and information density in DNA storage.
Contribution
It presents a novel iterative encoding method with a new mapping technique that reduces bit errors and enhances data density for DNA storage applications.
Findings
Achieves 1.8523 bits/nt information density at m=3, α=0.05.
Reduces average bit error by 20.5% compared to random mapping.
Demonstrates robustness to error propagation in DNA data encoding.
Abstract
In this paper, we propose a novel iterative encoding algorithm for DNA storage to satisfy both the GC balance and run-length constraints using a greedy algorithm. DNA strands with run-length more than three and the GC balance ratio far from 50\% are known to be prone to errors. The proposed encoding algorithm stores data at high information density with high flexibility of run-length at most and GC balance between for arbitrary and . More importantly, we propose a novel mapping method to reduce the average bit error compared to the randomly generated mapping method, using a greedy algorithm. The proposed algorithm is implemented through iterative encoding, consisting of three main steps: randomization, M-ary mapping, and verification. The proposed algorithm has an information density of 1.8523 bits/nt in the case of and . Also, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Advanced biosensing and bioanalysis techniques · Algorithms and Data Compression
