On k-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction
Kuan Cheng, Elena Grigorescu, Xin Li, Madhu Sudan, Minshen Zhu

TL;DR
This paper investigates the limitations and performance of k-mer-based and maximum likelihood algorithms for trace reconstruction, establishing optimality bounds and analyzing their sample complexity in recovering original binary strings from traces.
Contribution
It proves the optimality of trace complexity for k-mer-based algorithms and analyzes the near-optimal performance of the maximum likelihood estimator in trace reconstruction.
Findings
k-mer algorithms require exponential traces, matching the lower bound
MLE achieves nearly optimal trace complexity, within a factor of n
Analysis techniques used are essentially tight, indicating need for new methods
Abstract
The goal of the trace reconstruction problem is to recover a string given many independent {\em traces} of , where a trace is a subsequence obtained from deleting bits of independently with some given probability A recent result of Chase (STOC 2021) shows how can be determined (in exponential time) from traces. This is the state-of-the-art result on the sample complexity of trace reconstruction. In this paper we consider two kinds of algorithms for the trace reconstruction problem. Our first, and technically more involved, result shows that any -mer-based algorithm for trace reconstruction must use traces, under the assumption that the estimator requires traces, thus establishing the optimality of this number of traces. The analysis of this result also shows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Molecular Biology Techniques and Applications · Machine Learning and Algorithms
