Sets Represented as the Length-n Factors of a Word
Shuo Tan, Jeffrey Shallit

TL;DR
This paper investigates the combinatorial properties of sets of length-n factors of words, providing bounds, formulas, and experimental data on their representation and occurrence within finite words.
Contribution
It offers new bounds, formulas, and experimental insights into the representation of subset sets as factors of words, advancing understanding of factor set structures.
Findings
Upper and lower bounds for the number of subsets as factors
A weak upper bound and experimental data for minimal word length
A closed-form formula for the number of subsets when n <= t < 2n
Abstract
In this paper we consider the following problems: how many different subsets of Sigma^n can occur as set of all length-n factors of a finite word? If a subset is representable, how long a word do we need to represent it? How many such subsets are represented by words of length t? For the first problem, we give upper and lower bounds of the form alpha^(2^n) in the binary case. For the second problem, we give a weak upper bound and some experimental data. For the third problem, we give a closed-form formula in the case where n <= t < 2n. Algorithmic variants of these problems have previously been studied under the name "shortest common superstring".
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
