Maximal number of subword occurrences in a word
Wenjie Fang

TL;DR
This paper investigates the maximal number of subword occurrences in words, introduces the concept of subword entropy, and establishes bounds and limit values for minimal subword entropy across fixed-length words and alphabets.
Contribution
It defines subword entropy, derives bounds for minimal subword entropy, and explores its asymptotic behavior, including improved bounds for binary alphabets and conjectures based on experiments.
Findings
Established bounds for minimal subword entropy
Proved the existence of a limit value for subword entropy per letter
Provided improved bounds for binary alphabets using periodic words
Abstract
We consider the number of occurrences of subwords (non-consecutive sub-sequences) in a given word. We first define the notion of subword entropy of a given word that measures the maximal number of occurrences among all possible subwords. We then give upper and lower bounds of minimal subword entropy for words of fixed length in a fixed alphabet, and also showing that minimal subword entropy per letter has a limit value. A better upper bound of minimal subword entropy for a binary alphabet is then given by looking at certain families of periodic words. We also give some conjectures based on experimental observations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
