Density dichotomy in random words

Joshua Cooper; Danny Rorabaugh

arXiv:1504.04424·math.CO·October 18, 2016

Density dichotomy in random words

Joshua Cooper, Danny Rorabaugh

PDF

Open Access

TL;DR

This paper investigates the density of homomorphic images of words within random words, establishing a dichotomy based on whether the word is doubled, and explores convergence and concentration properties.

Contribution

It introduces a density dichotomy for words in random sequences, linking the property of being doubled to the asymptotic behavior of homomorphic image density.

Findings

01

Doubled words have density tending to zero in large random words.

02

Non-doubled words exhibit different convergence behaviors.

03

Concentration results describe the distribution of densities for doubled words.

Abstract

Word $W$ is said to encounter word $V$ provided there is a homomorphism $ϕ$ mapping letters to nonempty words so that $ϕ (V)$ is a substring of $W$ . For example, taking $ϕ$ such that $ϕ (h) = c$ and $ϕ (u) = i e n$ , we see that "science" encounters "huh" since $c i e n c = ϕ (h u h)$ . The density of $V$ in $W$ , $δ (V, W)$ , is the proportion of substrings of $W$ that are homomorphic images of $V$ . So the density of "huh" in "science" is $2/ (2 8)$ . A word is doubled if every letter that appears in the word appears at least twice. The dichotomy: Let $V$ be a word over any alphabet, $Σ$ a finite alphabet with at least 2 letters, and $W_{n} \in Σ^{n}$ chosen uniformly at random. Word $V$ is doubled if and only if $E (δ (V, W_{n})) \to 0$ as $n \to \infty$ . We further explore convergence for nondoubled words and concentration of the limit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicssemigroups and automata theory · DNA and Biological Computing · Authorship Attribution and Profiling