Faster and Simpler Online Computation of String Net Frequency

Shunsuke Inenaga

arXiv:2410.06837·cs.DS·January 1, 2026

Faster and Simpler Online Computation of String Net Frequency

Shunsuke Inenaga

PDF

Open Access

TL;DR

This paper introduces an improved online algorithm for computing string net frequencies, reducing the complexity and making it more practical for languages with large alphabets like Chinese.

Contribution

It presents a faster, output-optimal online algorithm for string net frequency computation using Weiner's suffix tree, improving over previous methods based on Ukkonen's suffix tree.

Findings

01

Answers Single-NF queries in O(m log σ) time

02

Reports all NF results in O(n) time, matching output size

03

Reduces computational complexity for large alphabet languages

Abstract

An occurrence of a repeated substring $u$ in a string $S$ is called a net occurrence if extending the occurrence to the left or to the right decreases the number of occurrences to 1. The net frequency (NF) of a repeated substring $u$ in a string $S$ is the number of net occurrences of $u$ in $S$ . Very recently, Guo et al. [SPIRE 2024] proposed an online $O (n lo g σ)$ -time algorithm that maintains a data structure of $O (n)$ space which answers Single-NF queries in $O (m lo g σ + σ^{2})$ time and reports all answers of the All-NF problem in $O (n σ^{2})$ time. Here, $n$ is the length of the input string $S$ , $m$ is the query pattern length, and $σ$ is the alphabet size. The $σ^{2}$ term is a major drawback of their method since computing string net frequencies is originally motivated for Chinese language processing where $σ$ can be thousands large. This paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Advanced Data Storage Technologies