Efficient Computation of Sequence Mappability

Panagiotis Charalampopoulos; Costas S. Iliopoulos; Tomasz Kociumaka,; Solon P. Pissis; Jakub Radoszewski; and Juliusz Straszy\'nski

arXiv:1807.11702·cs.DS·June 18, 2021

Efficient Computation of Sequence Mappability

Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka,, Solon P. Pissis, Jakub Radoszewski, and Juliusz Straszy\'nski

PDF

Open Access

TL;DR

This paper introduces efficient algorithms for the $(k,m)$-mappability problem, enabling faster computation of sequence similarity tables with controlled mismatches, and explores theoretical limits of the problem's complexity.

Contribution

It presents novel algorithms for the general $(k,m)$-mappability problem, including an $ ilde{O}(n)$ space and near-linear time solution for small $k$, and establishes complexity bounds under ETH.

Findings

01

Algorithms for $(k,m)$-mappability with $ ilde{O}(n)$ space and near-linear time for small $k$.

02

Extensions to compute all $(k,m)$-mappability tables efficiently.

03

Hardness results showing no strongly subquadratic algorithms for certain parameters under ETH.

Abstract

In the $(k, m)$ -mappability problem, for a given sequence $T$ of length $n$ , the goal is to compute a table whose $i$ th entry is the number of indices $j \neq = i$ such that the length- $m$ substrings of $T$ starting at positions $i$ and $j$ have at most $k$ mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $k = 1$ . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $k = O (1)$ , works in $O (n)$ space and, with high probability, in $O (n \cdot min {m^{k}, lo g^{k} n})$ time. Our algorithm requires a careful adaptation of the $k$ -errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Data Management and Algorithms