On Codes for the Noisy Substring Channel

Yonatan Yehezkeally; Nikita Polyanskii

arXiv:2102.01412·cs.IT·March 27, 2024

On Codes for the Noisy Substring Channel

Yonatan Yehezkeally, Nikita Polyanskii

PDF

TL;DR

This paper investigates coding strategies for the noisy substring channel, particularly in DNA storage, analyzing error correction under substitution and deletion noise models, and demonstrating sublinear redundancy and efficient encoding methods.

Contribution

It introduces a noisy channel model for substring sampling, extends the concept of repeat-free strings, and develops efficient encoders applicable to DNA storage and secondary-structure avoidance.

Findings

01

Redundancy due to noise is sublinear under certain conditions.

02

Asymptotic rate cost is negligible for small errors or long substrings.

03

Efficient encoders are proposed for error correction and structure avoidance.

Abstract

We consider the problem of coding for the substring channel, in which information strings are observed only through their (multisets of) substrings. Due to existing DNA sequencing techniques and applications in DNA-based storage systems, interest in this channel has renewed in recent years. In contrast to existing literature, we consider a noisy channel model where information is subject to noise before its substrings are sampled, motivated by in-vivo storage. We study two separate noise models, substitutions or deletions. In both cases, we examine families of codes which may be utilized for error-correction and present combinatorial bounds on their sizes. Through a generalization of the concept of repeat-free strings, we show that the added required redundancy due to this imperfect observation assumption is sublinear, either when the fraction of errors in the observed substring length…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.