Efficient Computation of Sequence Mappability
Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka,, Solon P. Pissis, Jakub Radoszewski, and Juliusz Straszy\'nski

TL;DR
This paper introduces efficient algorithms for the $(k,m)$-mappability problem, enabling faster computation of sequence similarity tables with controlled mismatches, and explores theoretical limits of the problem's complexity.
Contribution
It presents novel algorithms for the general $(k,m)$-mappability problem, including an $ ilde{O}(n)$ space and near-linear time solution for small $k$, and establishes complexity bounds under ETH.
Findings
Algorithms for $(k,m)$-mappability with $ ilde{O}(n)$ space and near-linear time for small $k$.
Extensions to compute all $(k,m)$-mappability tables efficiently.
Hardness results showing no strongly subquadratic algorithms for certain parameters under ETH.
Abstract
In the -mappability problem, for a given sequence of length , the goal is to compute a table whose th entry is the number of indices such that the length- substrings of starting at positions and have at most mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for , works in space and, with high probability, in time. Our algorithm requires a careful adaptation of the -errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Data Management and Algorithms
