Faster and Simpler Online Computation of String Net Frequency
Shunsuke Inenaga

TL;DR
This paper introduces an improved online algorithm for computing string net frequencies, reducing the complexity and making it more practical for languages with large alphabets like Chinese.
Contribution
It presents a faster, output-optimal online algorithm for string net frequency computation using Weiner's suffix tree, improving over previous methods based on Ukkonen's suffix tree.
Findings
Answers Single-NF queries in O(m log σ) time
Reports all NF results in O(n) time, matching output size
Reduces computational complexity for large alphabet languages
Abstract
An occurrence of a repeated substring in a string is called a net occurrence if extending the occurrence to the left or to the right decreases the number of occurrences to 1. The net frequency (NF) of a repeated substring in a string is the number of net occurrences of in . Very recently, Guo et al. [SPIRE 2024] proposed an online -time algorithm that maintains a data structure of space which answers Single-NF queries in time and reports all answers of the All-NF problem in time. Here, is the length of the input string , is the query pattern length, and is the alphabet size. The term is a major drawback of their method since computing string net frequencies is originally motivated for Chinese language processing where can be thousands large. This paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Advanced Data Storage Technologies
