Computing q-gram Frequencies on Collage Systems
Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

TL;DR
This paper introduces an efficient algorithm for computing all q-gram frequencies in strings represented by collage systems, a general framework for compressed text, with improved time complexity.
Contribution
It presents the first algorithm with a time complexity of O((q+h log n)n) for all q-gram frequency computation on collage system compressed strings.
Findings
Algorithm computes all q-gram frequencies efficiently.
Time complexity depends on string size, q, and collage system height.
Provides a practical method for analyzing compressed texts.
Abstract
Collage systems are a general framework for representing outputs of various text compression algorithms. We consider the all -gram frequency problem on compressed string represented as a collage system, and present an -time -space algorithm for calculating the frequencies for all -grams that occur in the string. Here, and are respectively the size and height of the collage system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory
