On Codes for the Noisy Substring Channel
Yonatan Yehezkeally, Nikita Polyanskii

TL;DR
This paper investigates coding strategies for the noisy substring channel, particularly in DNA storage, analyzing error correction under substitution and deletion noise models, and demonstrating sublinear redundancy and efficient encoding methods.
Contribution
It introduces a noisy channel model for substring sampling, extends the concept of repeat-free strings, and develops efficient encoders applicable to DNA storage and secondary-structure avoidance.
Findings
Redundancy due to noise is sublinear under certain conditions.
Asymptotic rate cost is negligible for small errors or long substrings.
Efficient encoders are proposed for error correction and structure avoidance.
Abstract
We consider the problem of coding for the substring channel, in which information strings are observed only through their (multisets of) substrings. Due to existing DNA sequencing techniques and applications in DNA-based storage systems, interest in this channel has renewed in recent years. In contrast to existing literature, we consider a noisy channel model where information is subject to noise before its substrings are sampled, motivated by in-vivo storage. We study two separate noise models, substitutions or deletions. In both cases, we examine families of codes which may be utilized for error-correction and present combinatorial bounds on their sizes. Through a generalization of the concept of repeat-free strings, we show that the added required redundancy due to this imperfect observation assumption is sublinear, either when the fraction of errors in the observed substring length…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
