R-enum Revisited: Speedup and Extension for Context-Sensitive Repeats and Net Frequencies
Kotaro Kimura, Tomohiro I

TL;DR
This paper improves the efficiency of enumerating characteristic substrings in compressed strings, extends the method to compute context-sensitive repeats and net frequencies, and introduces new bounds on minimal unique substrings.
Contribution
It enhances the r-enum algorithm for faster enumeration, extends it to new types of repeats, and introduces a novel bound on the number of minimal unique substrings.
Findings
Improved r-enum running time to O(n)
Extended r-enum to compute near-supermaximal and supermaximal repeats
Established a new upper bound of 2r on the number of minimal unique substrings
Abstract
Nishimoto and Tabei [CPM, 2021] proposed r-enum, an algorithm to enumerate various characteristic substrings, including maximal repeats, in a string of length in words of compressed working space, where is the number of runs in the Burrows-Wheeler transform (BWT) of . Given the run-length encoded BWT (RLBWT) of , r-enum runs in time in addition to the time linear to the number of output strings, where is the word size. In this paper, we first improve the term to . We next extend r-enum to compute other context-sensitive repeats such as near-supermaximal repeats (NSMRs) and supermaximal repeats, as well as the context diversity for every maximal repeat in the same complexities. Furthermore, we study net occurrences: An occurrence of a repeat is called a net occurrence if it is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Advanced Database Systems and Queries
