Sets Represented as the Length-n Factors of a Word

Shuo Tan; Jeffrey Shallit

arXiv:1304.3666·cs.FL·April 15, 2013·1 cites

Sets Represented as the Length-n Factors of a Word

Shuo Tan, Jeffrey Shallit

PDF

Open Access

TL;DR

This paper investigates the combinatorial properties of sets of length-n factors of words, providing bounds, formulas, and experimental data on their representation and occurrence within finite words.

Contribution

It offers new bounds, formulas, and experimental insights into the representation of subset sets as factors of words, advancing understanding of factor set structures.

Findings

01

Upper and lower bounds for the number of subsets as factors

02

A weak upper bound and experimental data for minimal word length

03

A closed-form formula for the number of subsets when n <= t < 2n

Abstract

In this paper we consider the following problems: how many different subsets of Sigma^n can occur as set of all length-n factors of a finite word? If a subset is representable, how long a word do we need to represent it? How many such subsets are represented by words of length t? For the first problem, we give upper and lower bounds of the form alpha^(2^n) in the binary case. For the second problem, we give a weak upper bound and some experimental data. For the third problem, we give a closed-form formula in the case where n <= t < 2n. Algorithmic variants of these problems have previously been studied under the name "shortest common superstring".

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing