Factorization in Formal Languages
Paul Bell, Daniel Reidenbach, Jeffrey Shallit

TL;DR
This paper explores unique and semi-unique factorization properties in formal languages, establishing regularity results, bounds, and demonstrating limitations for context-free languages, with various novel factorization notions.
Contribution
It introduces new concepts of semi-unique and permutation-based factorizations, providing bounds and counterexamples for regular and context-free languages.
Findings
uf(L) is regular if L is regular
Bounds on shortest words outside uf(L)
uf(L) need not be context-free for context-free L
Abstract
We consider several novel aspects of unique factorization in formal languages. We reprove the familiar fact that the set uf(L) of words having unique factorization into elements of L is regular if L is regular, and from this deduce an quadratic upper and lower bound on the length of the shortest word not in uf(L). We observe that uf(L) need not be context-free if L is context-free. Next, we consider variations on unique factorization. We define a notion of "semi-unique" factorization, where every factorization has the same number of terms, and show that, if L is regular or even finite, the set of words having such a factorization need not be context-free. Finally, we consider additional variations, such as unique factorization "up to permutation" and "up to subset".
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Algorithms and Data Compression · Natural Language Processing Techniques
