Recovering a Message from an Incomplete Set of Noisy Fragments
Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony

TL;DR
This paper analyzes the capacity of a novel torn-paper channel model, characterizing how well information can be recovered from incomplete, shuffled, and noisy message fragments, with applications in molecular data storage and forensics.
Contribution
It provides a closed-form expression for the channel capacity considering arbitrary fragment lengths and deletion probabilities, extending to noisy fragments with capacity bounds.
Findings
Capacity is given by a formula involving coverage and alignment cost.
Bounds for noisy fragments are derived and match under certain conditions.
The model applies to molecular storage and forensic data reconstruction.
Abstract
We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment length distributions and deletion probabilities. Precisely, we show that the capacity is given by a closed-form expression that can be interpreted as F - A, where F is the coverage fraction ,i.e., the fraction of the input codeword that is covered by output fragments, and A is an alignment cost incurred due to the lack of ordering in the output fragments. We then consider a noisy version of the problem, where the fragments are corrupted by binary symmetric noise. We derive upper and lower bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression
