R-enum: Enumeration of Characteristic Substrings in BWT-runs Bounded   Space

Takaaki Nishimoto; Yasuo Tabei

arXiv:2004.01493·cs.DS·March 3, 2021·1 cites

R-enum: Enumeration of Characteristic Substrings in BWT-runs Bounded Space

Takaaki Nishimoto, Yasuo Tabei

PDF

Open Access 1 Repo

TL;DR

This paper introduces r-enum, a space-efficient algorithm for enumerating characteristic substrings in strings using run-length encoded Burrows-Wheeler transform, optimized for highly repetitive strings and large datasets.

Contribution

The paper presents the first RLBWT-based enumeration algorithm for characteristic substrings, achieving improved space efficiency for highly repetitive strings.

Findings

01

Runs in $O(n \, \log \log (n/r))$ time

02

Uses $O(r \log n)$ bits of space, with $r$ being the number of RLBWT runs

03

More space-efficient than previous methods on benchmark datasets

Abstract

Enumerating characteristic substrings (e.g., maximal repeats, minimal unique substrings, and minimal absent words) in a given string has been an important research topic because there are a wide variety of applications in various areas such as string processing and computational biology. Although several enumeration algorithms for characteristic substrings have been proposed, they are not space-efficient in that their space-usage is proportional to the length of an input string. Recently, the run-length encoded Burrows-Wheeler transform (RLBWT) has attracted increased attention in string processing, and various algorithms for the RLBWT have been developed. Developing enumeration algorithms for characteristic substrings with the RLBWT, however, remains a challenge. In this paper, we present r-enum (RLBWT-based enumeration), the first enumeration algorithm for characteristic substrings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TNishimoto/renum
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms