
TL;DR
This paper introduces prefix normal words, a new class of binary words characterized by their factor and prefix properties, with applications in jumbled pattern matching and Parikh vector analysis.
Contribution
It provides the first non-trivial characterization of binary words sharing the same set of Parikh vectors using prefix normal words.
Findings
Prefix normal words are not context-free.
They are strictly contained within pre-necklaces.
Abstract
We present a new class of binary words: the prefix normal words. They are defined by the property that for any given length , no factor of length has more 's than the prefix of the same length. These words arise in the context of indexing for jumbled pattern matching (a.k.a. permutation matching or Parikh vector matching), where the aim is to decide whether a string has a factor with a given multiplicity of characters, i.e., with a given Parikh vector. Using prefix normal words, we give the first non-trivial characterization of binary words having the same set of Parikh vectors of factors. We prove that the language of prefix normal words is not context-free and is strictly contained in the language of pre-necklaces, which are prefixes of powers of Lyndon words. We discuss further properties and state open problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
