On k-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace   Reconstruction

Kuan Cheng; Elena Grigorescu; Xin Li; Madhu Sudan; Minshen Zhu

arXiv:2308.14993·cs.IT·January 30, 2024

On k-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction

Kuan Cheng, Elena Grigorescu, Xin Li, Madhu Sudan, Minshen Zhu

PDF

Open Access

TL;DR

This paper investigates the limitations and performance of k-mer-based and maximum likelihood algorithms for trace reconstruction, establishing optimality bounds and analyzing their sample complexity in recovering original binary strings from traces.

Contribution

It proves the optimality of trace complexity for k-mer-based algorithms and analyzes the near-optimal performance of the maximum likelihood estimator in trace reconstruction.

Findings

01

k-mer algorithms require exponential traces, matching the lower bound

02

MLE achieves nearly optimal trace complexity, within a factor of n

03

Analysis techniques used are essentially tight, indicating need for new methods

Abstract

The goal of the trace reconstruction problem is to recover a string $x \in {0, 1}^{n}$ given many independent {\em traces} of $x$ , where a trace is a subsequence obtained from deleting bits of $x$ independently with some given probability $p \in [0, 1) .$ A recent result of Chase (STOC 2021) shows how $x$ can be determined (in exponential time) from $exp (O (n^{1/5}))$ traces. This is the state-of-the-art result on the sample complexity of trace reconstruction. In this paper we consider two kinds of algorithms for the trace reconstruction problem. Our first, and technically more involved, result shows that any $k$ -mer-based algorithm for trace reconstruction must use $exp (Ω (n^{1/5}))$ traces, under the assumption that the estimator requires $p o l y (2^{k}, 1/ ε)$ traces, thus establishing the optimality of this number of traces. The analysis of this result also shows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Molecular Biology Techniques and Applications · Machine Learning and Algorithms