Finite automata, probabilistic method, and occurrence enumeration of a   pattern in words and permutations

Toufik Mansour; Reza Rastegar; Alexander Roitershtein

arXiv:1905.05646·math.CO·May 15, 2019·SIAM J. Discret. Math.

Finite automata, probabilistic method, and occurrence enumeration of a pattern in words and permutations

Toufik Mansour, Reza Rastegar, Alexander Roitershtein

PDF

TL;DR

This paper investigates the asymptotic enumeration and probabilistic distribution of pattern occurrences in words and permutations, establishing limit theorems and introducing weak avoidance concepts linked to non-product measures.

Contribution

It provides new asymptotic results for pattern occurrence counts, extends limit theorems to permutations, and introduces a novel weak avoidance framework with perturbation analysis.

Findings

01

Stanley-Wilf sequence converges to a limit independent of occurrence count

02

Established CLT and large deviation principles for pattern occurrences

03

Extended results from words to permutations

Abstract

The main theme of this paper is the enumeration of the occurrence of a pattern in words and permutations. We mainly focus on asymptotic properties of the sequence $f_{r}^{v} (k, n),$ the number of $n$ -array $k$ -ary words that contain a given pattern $v$ exactly $r$ times. In addition, we study the asymptotic behavior of the random variable $X_{n},$ the number of pattern occurrences in a random $n$ -array word. The two topics are closely related through the identity $P (X_{n} = r) =$ $\frac{1}{k ^{n}} f_{r}^{v} (k, n) .$ In particular, we show that for any $r \geq 0,$ the Stanley-Wilf sequence $(f_{r}^{v} (k, n))^{1/ n}$ converges to a limit independent of $r,$ and determine the value of the limit. We then obtain several limit theorems for the distribution of $X_{n},$ including a CLT, large deviation estimates, and the exact growth rate of the entropy of $X_{n} .$ Furthermore, we introduce a concept of weak…

Equations292

P (X_{n} = r) = \frac{1}{k ^{n}} f_{r}^{v} (k, n) \mbox and P (X_{n} \leq r) = \frac{1}{k ^{n}} g_{r}^{v} (k, n) .

P (X_{n} = r) = \frac{1}{k ^{n}} f_{r}^{v} (k, n) \mbox and P (X_{n} \leq r) = \frac{1}{k ^{n}} g_{r}^{v} (k, n) .

\displaystyle\lim_{n\to\infty}\bigl{(}f_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}=\lim_{n\to\infty}\bigl{(}g_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}=d-1,

\displaystyle\lim_{n\to\infty}\bigl{(}f_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}=\lim_{n\to\infty}\bigl{(}g_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}=d-1,

n \to \infty lim \frac{g _{r}^{v} ( k , n )}{n ^{M_{r}} ( d - 1 ) ^{n}} = C_{r} \mbox and n \to \infty lim \frac{f _{r}^{v} ( k , n )}{n ^{M_{r}} ( d - 1 ) ^{n}} = K_{r} .

n \to \infty lim \frac{g _{r}^{v} ( k , n )}{n ^{M_{r}} ( d - 1 ) ^{n}} = C_{r} \mbox and n \to \infty lim \frac{f _{r}^{v} ( k , n )}{n ^{M_{r}} ( d - 1 ) ^{n}} = K_{r} .

\displaystyle c^{v}_{k,n}(x)=\sum_{w\in[k]^{n}}\prod_{i=1}^{\binom{n}{\ell}}\bigl{(}1-xX_{n,i}(w)\bigr{)}=\sum_{w\in[k]^{n}}(1-x)^{occ_{v}(w)}=\sum_{r\geq 0}f_{r}^{v}(k,n)(1-x)^{r}.

\displaystyle c^{v}_{k,n}(x)=\sum_{w\in[k]^{n}}\prod_{i=1}^{\binom{n}{\ell}}\bigl{(}1-xX_{n,i}(w)\bigr{)}=\sum_{w\in[k]^{n}}(1-x)^{occ_{v}(w)}=\sum_{r\geq 0}f_{r}^{v}(k,n)(1-x)^{r}.

Q_{k, n}^{v, x} (A) = \frac{1}{c _{k, n}^{v} ( x )} w \in A \sum (1 - x)^{oc c_{v} (w)}, A \subset [k]^{n} .

Q_{k, n}^{v, x} (A) = \frac{1}{c _{k, n}^{v} ( x )} w \in A \sum (1 - x)^{oc c_{v} (w)}, A \subset [k]^{n} .

\displaystyle\lim_{n\to\infty}\Bigl{|}\frac{f_{r}^{v_{n}}(k_{n},n)}{k_{n}^{n}}-\frac{\mu_{n}^{r}e^{-\mu_{n}}}{r!}\Bigr{|}=0,

\displaystyle\lim_{n\to\infty}\Bigl{|}\frac{f_{r}^{v_{n}}(k_{n},n)}{k_{n}^{n}}-\frac{\mu_{n}^{r}e^{-\mu_{n}}}{r!}\Bigr{|}=0,

w (j_{p}) < w (j_{q}) ⟺ v_{p} < v_{q} \forall 1 \leq p, q \leq ℓ

w (j_{p}) < w (j_{q}) ⟺ v_{p} < v_{q} \forall 1 \leq p, q \leq ℓ

w (j_{p}) = w (j_{q}) ⟺ v_{p} = v_{q} \forall 1 \leq p, q \leq ℓ .

w (j_{p}) = w (j_{q}) ⟺ v_{p} = v_{q} \forall 1 \leq p, q \leq ℓ .

f_{r}^{v} (k, n) = # {w \in [k]^{n} : oc c_{v} (w) = r} \mbox and g_{r}^{v} (k, n) = j = 0 \sum r f_{j}^{v} (k, n) .

f_{r}^{v} (k, n) = # {w \in [k]^{n} : oc c_{v} (w) = r} \mbox and g_{r}^{v} (k, n) = j = 0 \sum r f_{j}^{v} (k, n) .

F_{k, n}^{v} (x) = r \geq 0 \sum f_{r}^{v} (k, n) x^{r} and G_{k, n}^{v} (x) = r \geq 0 \sum g_{r}^{v} (k, n) x^{r} .

F_{k, n}^{v} (x) = r \geq 0 \sum f_{r}^{v} (k, n) x^{r} and G_{k, n}^{v} (x) = r \geq 0 \sum g_{r}^{v} (k, n) x^{r} .

\displaystyle\lim_{n\to\infty}\bigl{(}f_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}=\lim_{n\to\infty}\bigl{(}g_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}=d-1,

\displaystyle\lim_{n\to\infty}\bigl{(}f_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}=\lim_{n\to\infty}\bigl{(}g_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}=d-1,

oc c_{v} (w^{'} u) = m \mbox ifandonlyif oc c_{v} (w u) = m, \forall m \leq r .

oc c_{v} (w^{'} u) = m \mbox ifandonlyif oc c_{v} (w u) = m, \forall m \leq r .

{w \in [k]^{*} : oc c_{v} (w) > r} \subset R (v, r, k) .

{w \in [k]^{*} : oc c_{v} (w) > r} \subset R (v, r, k) .

⟨ w ⟩_{v; r} \subset {w^{'} \in [k]^{*} : oc c_{v} (w^{'}) = oc c_{v} (w)} .

⟨ w ⟩_{v; r} \subset {w^{'} \in [k]^{*} : oc c_{v} (w^{'}) = oc c_{v} (w)} .

R (v, r, k) = {w \in [k]^{*} : oc c_{v} (w) > r} .

R (v, r, k) = {w \in [k]^{*} : oc c_{v} (w) > r} .

E (v, r, k) = E (v, r, k) \ {R (v, r, k)}

E (v, r, k) = E (v, r, k) \ {R (v, r, k)}

E (v, r, k) = {w \in [k]^{*} : oc c_{v} (w) \leq r} ⋃ ⟨ w ⟩_{v; r} .

E (v, r, k) = {w \in [k]^{*} : oc c_{v} (w) \leq r} ⋃ ⟨ w ⟩_{v; r} .

E (123, 1, 3) = {⟨ ϵ ⟩, ⟨ 1 ⟩, ⟨ 11 ⟩, ⟨ 12 ⟩, ⟨ 112 ⟩, ⟨ 123 ⟩} .

E (123, 1, 3) = {⟨ ϵ ⟩, ⟨ 1 ⟩, ⟨ 11 ⟩, ⟨ 12 ⟩, ⟨ 112 ⟩, ⟨ 123 ⟩} .

[T (v, r, k)]_{ij} = # {a \in [k] : δ (s_{i}, a) = s_{j}} .

[T (v, r, k)]_{ij} = # {a \in [k] : δ (s_{i}, a) = s_{j}} .

g_{r}^{v} (k, n)

g_{r}^{v} (k, n)

\displaystyle\left(\begin{array}[]{llllll}2&1&0&0&0&0\\ 0&1&1&1&0&0\\ 0&0&2&0&1&0\\ 0&0&0&1&1&1\\ 0&0&0&0&2&0\\ 0&0&0&0&0&2\\ \end{array}\right).

\displaystyle\left(\begin{array}[]{llllll}2&1&0&0&0&0\\ 0&1&1&1&0&0\\ 0&0&2&0&1&0\\ 0&0&0&1&1&1\\ 0&0&0&0&2&0\\ 0&0&0&0&0&2\\ \end{array}\right).

G_{1, 3}^{123} (x)

G_{1, 3}^{123} (x)

F_{1, 3}^{123} (x) = n \geq 0 \sum f_{1}^{123} (3, n) x^{n} = \frac{x ^{3}}{( 1 - 2 x ) ^{2} ( 1 - x ) ^{2}} .

F_{1, 3}^{123} (x) = n \geq 0 \sum f_{1}^{123} (3, n) x^{n} = \frac{x ^{3}}{( 1 - 2 x ) ^{2} ( 1 - x ) ^{2}} .

G_{1, 4}^{123} (x) = n \geq 0 \sum g_{1}^{123} (4, n) x^{n} = \frac{( 1 - 7 x + 22 x ^{2} - 32 x ^{3} + 16 x ^{4} - 2 x ^{5} )}{( 1 - x ) ( 1 - 2 x ) ^{5}},

G_{1, 4}^{123} (x) = n \geq 0 \sum g_{1}^{123} (4, n) x^{n} = \frac{( 1 - 7 x + 22 x ^{2} - 32 x ^{3} + 16 x ^{4} - 2 x ^{5} )}{( 1 - x ) ( 1 - 2 x ) ^{5}},

G_{1, 5}^{123} (x) = n \geq 0 \sum g_{1}^{123} (5, n) x^{n} = \frac{( 1 - 10 x + 48 x ^{2} - 124 x ^{3} + 170 x ^{4} - 103 x ^{5} - 3 x ^{6} + 23 x ^{7} )}{( 1 - x ) ( 1 - 2 x ) ^{7}} .

G_{1, 5}^{123} (x) = n \geq 0 \sum g_{1}^{123} (5, n) x^{n} = \frac{( 1 - 10 x + 48 x ^{2} - 124 x ^{3} + 170 x ^{4} - 103 x ^{5} - 3 x ^{6} + 23 x ^{7} )}{( 1 - x ) ( 1 - 2 x ) ^{7}} .

n \geq 0 \sum f_{0}^{12 \dots k} (k, n) x^{n} = j = 0 \sum k - 1 \frac{x ^{j}}{( 1 - ( k - 1 ) x ) ^{j + 1}} .

n \geq 0 \sum f_{0}^{12 \dots k} (k, n) x^{n} = j = 0 \sum k - 1 \frac{x ^{j}}{( 1 - ( k - 1 ) x ) ^{j + 1}} .

z_{2 k - 2 j} (x) = i = 1 \sum j + 1 \frac{x ^{i - 1}}{( 1 - ( k - 1 ) x ) ^{i}}

z_{2 k - 2 j} (x) = i = 1 \sum j + 1 \frac{x ^{i - 1}}{( 1 - ( k - 1 ) x ) ^{i}}

z_{2 k - 1 - 2 j} (x)

z_{2 k - 1 - 2 j} (x)

n \geq 0 \sum g_{1}^{12 \dots k} (k, n) x^{n} = e_{1}^{t} C^{- 1} (e_{1} + \dots + e_{2 k}) = z_{1} (x)

n \geq 0 \sum g_{1}^{12 \dots k} (k, n) x^{n} = e_{1}^{t} C^{- 1} (e_{1} + \dots + e_{2 k}) = z_{1} (x)

= \frac{1}{1 - ( k - 1 ) x} + \frac{x}{( 1 - ( k - 1 ) x ) ( 1 - ( k - 2 ) x )} + \frac{x ^{2}}{( 1 - ( k - 1 ) x ) ( 1 - ( k - 2 ) x )} (z_{3} + z_{4})

= \frac{1}{1 - ( k - 1 ) x} + \frac{x}{( 1 - ( k - 1 ) x ) ( 1 - ( k - 2 ) x )}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Finite automata, probabilistic method, and occurrence enumeration of a pattern in words and permutations

Toufik Mansour Department of Mathematics, University of Haifa, 199 Abba Khoushy Ave, 3498838 Haifa, Israel;

e-mail: [email protected]

Reza Rastegar Occidental Petroleum Corporation, Houston, TX 77046 and Departments of Mathematics and Petroleum Engineering, University of Tulsa, OK 74104, USA - Adjunct Professor; e-mail: [email protected]

Alexander Roitershtein Department of Statistics, Texas A&M University, College Station, TX 77843, USA;

e-mail: [email protected]

Abstract

The main theme of this paper is the enumeration of the occurrence of a pattern in words and permutations. We mainly focus on asymptotic properties of the sequence $f_{r}^{v}(k,n),$ the number of $n$ -array $k$ -ary words that contain a given pattern $v$ exactly $r$ times. In addition, we study the asymptotic behavior of the random variable $X_{n},$ the number of pattern occurrences in a random $n$ -array word. The two topics are closely related through the identity $P(X_{n}=r)=$ $\frac{1}{k^{n}}f_{r}^{v}(k,n).$ In particular, we show that for any $r\geq 0,$ the Stanley-Wilf sequence $\bigl{(}f_{r}^{v}(k,n)\bigr{)}^{1/n}$ converges to a limit independent of $r,$ and determine the value of the limit. We then obtain several limit theorems for the distribution of $X_{n},$ including a CLT, large deviation estimates, and the exact growth rate of the entropy of $X_{n}.$ Furthermore, we introduce a concept of weak avoidance and link it to a certain family of non-product measures on words that penalize pattern occurrences but do not forbid them entirely. We analyze this family of probability measures in a small parameter regime, where the distributions can be understood as a perturbation of a uniform measure. Finally, we extend some of our results for words, including the one regarding the equivalence of the limits of the Stanley-Wilf sequences, to pattern occurrences in permutations.

*MSC2010: * Primary 05A05, 05A15; Secondary 05A16, 68Q45, 60C05.

Keywords: pattern occurrences, weak avoidance, finite automata, random words, Stanley-Wilf type limits, limit theorems.

1 Introduction and main results

Pattern occurrence enumeration is a central topic in modern combinatorics, see for instance the monographs [8, 16, 20, 25]. In this paper, we are primarily concerned with pattern occurrence problem for words, however, we provide the extension of certain results in the context of permutations. We define words as finite arrays of letters from an alphabet $[k]:=\{1,\ldots,k\},$ for some given $k\in{\mathbb{N}}.$ A pattern is any distinguished word, and occurrence of a pattern $v$ in a word $w$ is a subsequence of letters in $w$ (not necessarily consecutive) that are in the same relative order as the letters in $v.$ For instance, the word $w=37451554$ has four occurrences of the pattern $v=1332,$ namely $3**5*5*4,$ $3**5**54,$ $3****554,$ and $****1554.$ See Subsection 2.1 for a more formal introduction of the concept. Occurrences of patterns in permutations are defined similarly, see the beginning of Section 3 for details.

Suppose that the alphabet $[k]$ and a pattern $v\in[k]^{\ell}$ are given, and that exactly $d\leq\ell$ distinct letters are used to form the pattern $v$ . For instance, if $k=7$ and $v=35731,$ then $\ell=5$ and $d=4.$ Our main object of interest is the frequency sequence $f_{r}^{v}(k,n),$ namely the number of words in $[k]^{n}$ that contain the pattern $v$ exactly $r$ times. We also study the asymptotic behavior of the partial sums $g_{r}^{v}(k,n)=\sum_{j\leq r}f_{r}^{v}(k,n)$ and $X_{n},$ the number of occurrences of $v$ in a random word distributed uniformly over $[k]^{n}$ . Remark that the distribution of the random variable $X_{n}$ is related to the sequences $f_{r}^{v}(k,n)$ and $g_{r}^{v}(k,n)$ through the identities

[TABLE]

The starting point of our study is the celebrated Stanley-Wilf conjecture which states that the number of permutations of size $n$ avoiding a pattern grows exponentially. The conjecture was settled by Marcus and Tardos [27] in 2004, see [11, 17, 25, 34] for a review of the history and recent developments in the field. The analogue of this result for the words is the convergence of the series $(f_{0}^{v}(k,n))^{1/n}$ . This was proved by Brändén and Mansour in [9] via a combinatorial analysis of certain finite automata that generate words avoiding a given pattern. In fact, it was shown in [9] that $\lim_{n\to\infty}(f_{0}^{v}(k,n))^{1/n}=d-1,$ where $d$ is the number of distinct letters in the pattern $v.$ In Section 2.2, we generalize this result to all $r\geq 0.$ Specifically, we show the following (as stated in Theorems 2.9 and 2.10):

Theorem A.

For any integer $r\geq 0,$

[TABLE]

where $d$ is the number of distinct letters in the pattern $v.$

Assume that $d>1.$ Then for any $r\geq 0,$ there exist a positive integer $M_{r}\in{\mathbb{N}}$ and real constants $C_{r}\in(0,\infty)$ and $K_{r}\geq 0$ such that

[TABLE]

We remark that in various examples with $d>1,$ we are able to verify $K_{r}>0.$ Nevertheless, we believe that it may be zero in some cases, see the discussion in Section 2.3.

We also give the following extension of this result for permutations. Let $\xi$ be a given permutation pattern of size $k$ and $f_{r}^{\xi}(n)$ denote the number of permutations of size $n$ that contain $\xi$ exactly $r$ times, $r\geq 0.$ We have (Theorem 3.1 below):

Theorem B.

For any $r\in{\mathbb{N}},$ $\lim_{n\to\infty}(f_{r}^{\xi}(n))^{\frac{1}{n}}$ exists and is equal to $\lim_{n\to\infty}(f_{0}^{\xi}(n))^{\frac{1}{n}}.$

In contrast to the obtained results in the context of words, we cannot describe the exact structure of Wilf-Stanley type limits as a function of the parameters $(k,\xi)$ in a general form.

The next result turns out to be a direct implication of Theorem A. It is stated below as Theorem 2.13.

Theorem C.

If $d>1,$ then, $\lim_{n\to\infty}\frac{H_{k,v}(n)}{n}=\log\frac{k}{d-1},$ where $H_{k,v}(n)$ is the entropy of $X_{n}.$

Loosely speaking, for a given $n$ , the entropy $H_{k,v}(n)$ measures the amount of uncertainty in the value of the random variable $X_{n}.$ Consequently, the entropy sequence $H_{k,v}(.)$ is subadditive, namely $H_{k,v}(n+m)\leq H_{k,v}(n)+H_{k,v}(m)$ because of the dependence of pattern occurrences each of other. The convergence of $\frac{H_{k,v}(n)}{n}$ is thus ensured by Fekete’s subadditivity lemma. Theorem 2.13 then gives the precise value of this limit for an arbitrary pattern $v.$

In Sections 2.4 and 2.5 we study the asymptotic behavior of the sequence $(X_{n})_{n\in{\mathbb{N}}}.$ In Section 2.5 we obtain a central limit theorem and several related asymptotic results for the distribution of $X_{n}.$ The following result is an analogue of the CLT for permutations obtained by Bóna in [8]. The bulk of the proof is an estimation of the variance of $X_{n}$ referred to as $\text{VAR}(X_{n}).$ The latter, together with general theorems of [29] and [23], yields also a Berry-Esseen type bound for the rate of convergence and large deviation estimates stated, respectively, in Corollaries 2.16 and 2.17. The following is the content of Theorem 2.14.

Theorem D.

Let $\mu_{n}=E(X_{n})$ and $\sigma_{n}=\sqrt{\text{VAR}(X_{n})}.$ Then $\mu_{n}=\binom{n}{\ell}\binom{k}{d}\frac{1}{k^{\ell}},$ $\sigma_{n}\sim\bigl{(}\frac{\mu_{n}}{\sqrt{n}}\bigr{)},$ and $\frac{X_{n}-\mu_{n}}{\sigma_{n}}$ converges in distribution, as $n\to\infty,$ to a standard normal random variable.

For a pattern of length $\ell,$ there are $\binom{n}{\ell}$ places in a word $w\in[k]^{n}$ where the pattern might occur. Enumerate them in an arbitrary way, and let $X_{n,i}(w)$ be the indicator of the event that the pattern occurs at the $i$ -th place in $w.$ Choose a parameter $x\in[0,1]$ and consider the following partition function penalizing the occurrences of $v:$

[TABLE]

Using this partition function, one can construct a Boltzmann distribution on $[k]^{n}$ as follows:

[TABLE]

The probability measure ${\mathbb{Q}}^{v,x}_{k,n}(\,\cdot\,)$ penalizes words $w$ with a non-zero $occ_{v}(w)$ with the factor $(1-x)^{occ_{v}(w)},$ but unless $x=1$ it doesn’t forbid them completely. We refer to a random word $w$ distributed according to ${\mathbb{Q}}^{v,x}_{k,n}$ as weakly avoiding the pattern $v.$ The construction and the terminology are inspired by their analogue in the theory of self-avoiding walks, where a similar construction is used to penalize self-intersection of the path of a random walk and introduce weakly self-avoiding walks [5]. Similar construction for permutations is outlined in Section 3.2. In the case of permutations and the inversion pattern $21,$ the above probability measure is a Mallow’s distribution. Mallow’s permutations have been studied by many authors, see, for instance, recent work [12, 19, 30] and references therein.

We remark that when $x=0,$ the above results for $X_{n}$ hold under ${\mathbb{Q}}^{v,x}_{k,n}$ as ${\mathbb{Q}}^{v,x}_{k,n}$ is the uniform distribution over $[k]^{n}$ . One would then expect that for a sequence $(x_{n})_{n\in{\mathbb{N}}}$ decaying to zero sufficiently fast, similar limit theorems hold for ${\mathbb{Q}}^{v,x_{n}}_{k,n}$ . Indeed, by using perturbation techniques we prove this the following (see Theorem 2.19):

Theorem E.

The following holds for any $t\in{\mathbb{R}}$ and a sequence of positive reals $(\rho_{n})_{n\in{\mathbb{N}}}$ such that $\gamma:=\lim_{n\to\infty}\frac{n^{\ell}}{\rho_{n}}\in[0,+\infty):$

$\lim_{n\to\infty}{\mathbb{E}}^{v,\frac{1}{\rho_{n}}}_{k,n}(e^{\frac{tX_{n}}{n^{\ell}}})=\exp\Bigl{[}\frac{t}{k^{\ell}\ell!}\binom{k}{d}\Bigr{]}.$ **

$\lim_{n\to\infty}\frac{1}{\sqrt{n}}\log{\mathbb{E}}^{v,\frac{1}{\rho_{n}}}_{k,n}(e^{\frac{tX_{n}\sqrt{n}}{n^{\ell}}})=J_{k,v}t,$ * where $J_{k,v}$ are strictly positive constants.*

Let ${\mathbb{Q}}_{n}(r)={\mathbb{Q}}^{v,\frac{1}{\rho_{n}}}_{k,n}(X_{n}=r)$ and ${\mathbb{H}}_{n}=-\sum_{r\geq 0}{\mathbb{Q}}_{n}(r)\log{\mathbb{Q}}_{n}(r)$ be the entropy of $X_{n}$ under the law ${\mathbb{Q}}^{v,\frac{1}{\rho_{n}}}_{k,n}.$ Then $\lim_{n\to\infty}\frac{{\mathbb{H}}_{n}}{n}=\log\frac{k}{d-1}+\frac{\gamma}{k^{\ell}\ell!}\binom{k}{d}.$

Note that in the context of permutations, somewhat similar perturbative regimes for Mallow’s permutations were recently studied in [6, 19, 33].

Another interesting result closely related to Theorem D (Theorem 2.14 below) is a limit theorem dealing with a Poisson approximation of $X_{n}$ in the case when $d=d_{n}$ is a rapidly increasing function of $n.$ The result is an analogue for random words of [12, Theorem 3.1] for random permutations, it is stated below as Theorem 2.22.

Theorem F.

Suppose that sequences of natural numbers $(k_{n})_{n\in{\mathbb{N}}},$ $(\ell_{n})_{n\in{\mathbb{N}}},$ and $(d_{n})_{n\in{\mathbb{N}}}$ satisfy the following condition:

There exist constants $A>0$ and $\beta>\frac{2}{2+\delta}$ such that $\min\{k_{n},\ell_{n}\}\geq d_{n}\geq An^{\beta}$ for all $n\in{\mathbb{N}},$ where $\delta=\liminf_{n\to\infty}\frac{d_{n}}{\ell_{n}}.$

Consider an arbitrary sequence of patterns $v_{n}\in[k_{n}]^{\ell_{n}},$ $n\in{\mathbb{N}},$ with $d_{n}$ distinct letters used to form $v_{n}.$ Let $X_{n}=occ_{v_{n}}(W_{n}),$ where $W_{n}$ is drawn at random from $[k_{n}]^{n}.$ Then

[TABLE]

for any integer $r\geq 0.$

The paper is structured as follows. Section 2 is devoted to pattern occurrences in words. The framework is formally introduced in Section 2.1. In Section 2.2 we study the sequences $f_{r}^{v}(k,n)$ and $g_{r}^{v}(k,n),$ $r\geq 0.$ The generating functions are explicitly computed for several examples using the automata approach and the transfer matrix method. The Stanley-Wilf limits of $f_{r}^{v}(k,n)$ and $g_{r}^{v}(k,n)$ are studied in Section 2.3. Section 2.4 is devoted to the study of words weakly avoiding a pattern. Section 2.5 contains various limit theorems for the distribution of the random variable $X_{n}.$ Finally, within the framework of permutations the Stanley-Wilf type limits and words weakly avoiding a pattern are discussed in Section 3.

2 Pattern occurrences in words

In this section we focus on pattern occurrences in words and study the asymptotic behavior of $f_{r}^{v}(k,n)$ and $X_{n}.$ The section is divided into five subsections. We begin with notation. Section’s organization is discussed in more detail at the end of Section 2.1.

2.1 Notation and settings

Let ${\mathbb{N}}$ and ${\mathbb{N}}_{0}$ denote, respectively, the set of natural numbers and the set of non-negative integers, that is ${\mathbb{N}}_{0}={\mathbb{N}}\cup\{0\}.$ For a given set $A,$ $\#A$ is the cardinality of $A.$ For any given $k\in{\mathbb{N}},$ we denote the set $\{1,2,\cdots k\}$ by $[k]$ and refer to it as an alphabet and to its elements as letters. A word of length $n,$ is an element of $[k]^{n},$ $n\in{\mathbb{N}}.$ A language $[k]^{*}:=\cup_{n=0}^{\infty}[k]^{n}$ is the set of all words compound of letters in an alphabet $[k].$ We adopt the convention that $[k]^{0}=\{\epsilon\},$ where $\epsilon$ is an empty word. For any $A\subset{\mathbb{N}}_{0}$ we denote by $[k]^{A}$ the union $\cup_{j\in A}[k]^{j}.$ For instance, $[k]^{\geq n}=\cup_{j\geq n}[k]^{j}$ and $[k]^{\leq n}=\cup_{j\leq n}[k]^{j}.$ We write a word $w\in[k]^{n}$ in the form $w=w(1)\cdots w(n),$ where $w(i)$ is the $i$ -the letter of $w.$ The concatenation of two words $w\in[k]^{n}$ and $v\in[k]^{m}$ is the word $wv:=w(1)\cdots w(n)v(1)\cdots v(m).$ For instance, the concatenation of $w=20$ and $v=19$ is $wv=2019.$ A pattern is any distinguished word in the underlying language $[k]^{*}.$

Let us now fix integers $k>0,$ $\ell\geq 2,$ and a pattern $v$ in $[k]^{\ell}.$ These parameters are considered to be given and fixed throughout the rest of Section 2. An important characteristic of the pattern turns out to be the number of distinct letters used to compound it. We will denote this number by $d.$ For instance, if $v=33415,$ then $\ell=5$ and $d=4.$

For a word $w\in[k]^{n}$ with $n\geq\ell,$ an occurrence of the pattern $v$ in $w$ is a sequence of $\ell$ indices $1\leq j_{1}<j_{2}<\dots<j_{\ell}\leq n$ such that the subword $w(j_{1})\cdots w(j_{\ell})\in[k]^{\ell}$ is order-isomorphic to the word $v,$ that is

[TABLE]

and

[TABLE]

For a word $w\in[k]^{*},$ we denote by $occ_{v}(w)$ the number of occurrences of $v$ in $w.$ For instance, if $v$ is the inversion $21$ and $w=35239,$ then $occ_{v}(w)=3$ (for the following three occurrences of pairs of letters which appear in the reverse order: $w(1)w(3)=32,$ $w(2)w(3)=52,$ and $w(2)w(4)=53$ ). We say that a word $w\in[k]^{*}$ contains the pattern $v$ exactly $r$ times, $r\in{\mathbb{N}}_{0},$ if $occ_{v}(w)=r.$ For $r\in{\mathbb{N}}_{0},$ we denote by $f_{r}^{v}(k,n)$ and $g_{r}^{v}(k,n),$ the number of words in $[k]^{n}$ that contain $v,$ respectively, exactly $r$ times and at most $r$ times. That is,

[TABLE]

We define their corresponding generating functions as

[TABLE]

We remark that given $f_{r}^{v}(k,n)=0$ for $r>\binom{n}{\ell},$ $F_{k,n}^{v}(x)$ is a polynomial in $x$ . Throughout this paper, $a_{n}\sim b_{n},$ $a_{n}=O(b_{n}),$ and $a_{n}=o(b_{n})$ for sequences $a_{n}$ and $b_{n}$ with elements that might depend on $k,r,\ell,d,v$ and other parameters, means that, respectively, $\lim_{n\to\infty}\frac{a_{n}}{b_{n}}=1,$ $\limsup_{n\to\infty}\bigl{|}\frac{a_{n}}{b_{n}}\bigr{|}<\infty,$ and $\lim_{n\to\infty}\frac{a_{n}}{b_{n}}=0$ for all feasible values of the parameters when the latter are fixed. As usual, $a_{n}=\Theta(b_{n})$ indicates that both $a_{n}=O(b_{n})$ and $b_{n}=O(a_{n})$ hold true.

The remainder of this section is divided into four subsections. In Section 2.2 we study a finite state automaton that generates words $w\in[k]^{n}$ with a given value of $occ_{v}(w).$ The words are then counted trough an application of the transfer-matrix method, allowing us to evaluate $g_{r}^{v}(k,n)$ and subsequently $f_{r}^{v}(k,n)$ in several interesting cases. The results of Section 2.2 are then used in Section 2.3 to show that (see Theorem 2.9) for any $r\geq 0,$

[TABLE]

where $d$ is the number of distinct letters in the pattern $v.$ Theorem 2.9 is the main result of this paper. Remark that a similar result for permutations is given by Theorem 3.1 in Section 3.1. We refer to $\lim_{n\to\infty}\bigl{(}f_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}}$ and their counterparts for permutations in Theorem 3.1 as Stanley-Wilf type limits.

Finally, Sections 2.4 and 2.5 deal with random words. Let $W_{n}$ be a permutation chosen at random from $[k]^{n}$ and $X_{n}=occ_{v}(W_{n}).$ In Section 2.5 we obtain a central limit theorem and several related asymptotic results for the distribution of $X_{n}.$ The study of $X_{n}$ is, in principle, equivalent to the study of the sequences $f_{r}^{v}(k,n)$ and $g_{r}^{v}(k,n)$ in view of the identities (1). In Section 2.4 we introduce a notion of weak avoidance for an arbitrary word pattern. In Theorem 2.19 we obtain limit theorems for random words avoiding a pattern weakly. The distribution of $W_{n}$ is not uniform in this case, and we use the CLT for the uniform case and perturbation techniques to derive the results.

2.2 Finite automata and pattern occurrences

Given an integer $r\geq 0$ , we define an equivalence relation $\sim_{v;r}$ on $[k]^{*}$ as follows. We say that two words $w^{\prime}$ and $w$ in $[k]^{*}$ are equivalent and write $w^{\prime}\sim_{v;r}w$ if the following condition holds for all $u\in[k]^{*}:$

[TABLE]

For instance, if $k=2$ , $r=1$ and $v=12$ , then $1\not\sim_{v;r}11$ because $occ_{12}(12)=1$ and $occ_{12}(112)=2$ . On the other hand, $11\sim_{v;r}111$ because $occ_{12}(11u)=occ_{12}(111u)=m$ for any $m=0,1$ , and $u\in[2]^{*}$ . We denote the equivalence class of a word $w$ by $\langle w\rangle_{v;r}$ . For simplicity in notation, we drop the indexes when context is clear. We remark that:

$w$ and $w^{\prime}$ do not need to have the same length in order to be equivalent;

-

if $occ_{v}(w)>r$ and $occ_{v}(w^{\prime})>r,$ then $w\sim_{v;r}w^{\prime}.$

The latter observation implies that there is a unique equivalence class ${\mathcal{R}}(v,r,k)$ such that

[TABLE]

Since the empty word $\epsilon$ is an element of the language $[k]^{*},$ it follows from (4) that if $occ_{v}(w)\leq r$ then

[TABLE]

In particular,

[TABLE]

The following lemma shows that the equivalence of any two words can be checked with a finite number of steps.

Lemma 2.1.

Let $w^{\prime}$ and $w$ be two words in $[k]^{*}.$ Then $w^{\prime}\sim_{v;r}w$ if and only if (4) holds for all $u\in[k]^{\leq r\ell}.$

Proof.

Let $\sim_{v;r}^{\prime}$ be an equivalence relation on $[k]^{*}$ such that $w^{\prime}\sim_{v;r}^{\prime}w$ if and only if (4) holds for all $u\in[k]^{\leq r\ell}.$ Clearly, $w^{\prime}\sim_{v;r}w$ implies $w^{\prime}\sim_{v;r}^{\prime}w$ . On the other hand, if $w^{\prime}\nsim_{v;r}w$ then there exists $u\in[k]^{*}$ such that $occ_{v}(w^{\prime}u)=m_{1}$ and $occ_{v}(wu)=m_{2}$ with $m_{1}\neq m_{2}$ and $m_{1},m_{2}\leq r.$ Without loss of generality we may assume that $m_{1}<m_{2}\leq r$ . The occurrences of $v$ in $wu$ can use at most $m_{2}\ell$ letters of $u.$ Thus there is a subsequence $u^{\prime}$ of $u$ of length at most $m_{2}\ell$ such that $occ_{v}(w^{\prime}u^{\prime})\leq m_{1}$ and $occ_{v}(wu^{\prime})=m_{2}$ , and hence $w^{\prime}\nsim_{v;r}^{\prime}w$ . ∎

Let ${\mathcal{E}}(v,r,k)$ be the set of all equivalence classes of $\sim_{v;r}.$ Note that by Lemma 2.1 the number of equivalence classes is finite. Recall ${\mathcal{R}}(v,r,k)$ from (5), and let

[TABLE]

denote the set of equivalence classes excluding ${\mathcal{R}}(v,r,k).$ By the definition,

[TABLE]

We next introduce the key tool in our proofs in this section.

Definition 2.2.

Given an integer $r\geq 0,$ we denote by $Au(v,r,k)$ a finite automaton [21] such that

•

The set of states of the automaton is $E(v,r,k);$

•

The input alphabet is $[k];$

•

Transition function $\delta:E(v,r,k)\times[k]\rightarrow E(v,r,k)$ is given by the rule $\delta(\langle w\rangle,a)=\langle wa\rangle;$

•

The initial state is $\langle\epsilon\rangle,$ where $\epsilon$ denotes the empty word;

•

All states are final states.

We identify the automaton $A(v,r,k)$ with a (labeled) directed graph with vertices in $E(v,r,k)$ such that there is a labeled edge $\stackrel{{\scriptstyle a}}{{\longrightarrow}}$ from $\langle w\rangle$ to $\langle w^{\prime}\rangle$ if and only if $wa\sim_{v,r}w^{\prime}$ .

Example 2.3.

Consider the case $v=123$ , $k=3,$ and $r=1.$ The set of equivalence classes $E(123,1,3)$ is given by

[TABLE]

The labeled graph associated with the automaton $Au(123,1,3)$ is

$\langle\epsilon\rangle$$\langle 1\rangle$$\langle 11\rangle$$\langle 112\rangle$$\langle 12\rangle$$\langle 123\rangle$$1$$1$$2$$2$$3$$2$ ........................ $2,3$ ........................ $3$ ........................ $1,3$ ........................ $1,2$ ........................ $1,2$ ........................ $1$

The automata serves for us as a bridge between the formal language theory and theory of computing on one side and the asymptotic theory of algebraic functions on the other. See, for instance, [4, 16] and references therein for background.

We exploit the link between asymptotic properties of rational functions and the structure of associated regular languages to study the generating functions $F_{r,k}^{v}(x)$ and $G_{r,k}^{v}(x)$ of the sequences $f_{r}^{v}(k,n)$ and $g_{r}^{v}(k,n)$ defined in (3), and subsequently the asymptotic behavior of these sequences, as $n$ tends to infinity. The class of automata $Au(v,0,k)$ has been introduced in [9]. Our results in this subsection (Lemmas 2.7 and 2.8 below) are extensions of the corresponding results in Section 2 of [9].

It is straightforward to verify (cf. [20, p. 256]) that one can order the states of the automaton $Au(v,r,k)$ as $s_{1},x_{2},\ldots,s_{p},$ $p=\#E(v,r,k),$ so that if $i<j$ then there is no path from the state $s_{j}$ to the state $s_{i}$ . Transition matrix $T(v,r,k)$ of $Au(v,r,k)$ is the $p\times p$ matrix with non-negative integer entries defined by

[TABLE]

Thus $[T(v,r,k)]_{ij}$ counts the number of edges between $s_{i}$ and $s_{j}$ , and $T(v,r,k)$ is triangular. The following observation reduces the study of the sequence $g_{r}^{v}(k,n),$ $n\in{\mathbb{N}},$ to the analysis of the matrix $T(v,r,k):$

[TABLE]

where $T=T(v,r,k)$ and $p=\#E(v,r,k).$

Example 2.4.

Consider again the setup of Example 2.3, namely $v=123$ , $k=3,$ and $r=1.$ The transition matrix $T(123,1,3)$ is given by

[TABLE]

Thus the generating function for the number of $3$ -ary words of length $n$ that contains $123$ at most once is given by

[TABLE]

where $e_{i}$ is the $i$ -th standard unit vector (all coordinates are zero, except that the $i$ -th coordinate is one). Note that the generating function for the number of $3$ -ary words of length $n$ that avoids $123$ is given by $F_{0,3}^{123}(x)=\sum_{n\geq 0}f_{0}^{123}(3,n)x^{n}=\frac{3x^{2}-3x+1}{(1-2x)^{3}}$ (see [10].) Therefore, by virtue of (8),

[TABLE]

Applying arguments similar to the one we used in order to get (8), we find that

[TABLE]

and

[TABLE]

Example 2.5.

The equivalence classes of $Au(12\cdots k,0,k)$ are given by $\langle\epsilon\rangle$ and $\langle 12\cdots j\rangle$ , where $j=1,2,\ldots,k-1$ . The $Au(12\cdots k,0,k)$ can be graphically represented as follows:

$\langle\epsilon\rangle$$\langle 1\rangle$$\langle 12\rangle$$\langle 12\cdots(k-2)\rangle$$\langle 12\cdots(k-1)\rangle$$1$$2$$3$$\cdots$$k-2$$k-1$ ........................ $2,3,\ldots,k$ ........................ $1,3,\ldots,k$ ........................ $1,2,4,\ldots,k$ ........................ $1,2,\ldots,k-2,k$ ........................ $1,2,\ldots,k-1$

Therefore, $T(12\cdots k,0,k)$ is given by the matrix $(a_{ij})_{1\leq i,j\leq k}$ with $a_{ii}=k-1$ and $a_{i(i+1)}=1$ for all $i=1,2,\ldots,k,$ and the remaining entries equal to zero. Consequently,

[TABLE]

Example 2.6.

It is not hard to see that the equivalence classes of $Au(12\cdots k,1,k)$ are given by $\langle\epsilon\rangle$ , $\langle 12\cdots j\rangle$ for $j=1,2,\ldots,k,$ and $\langle 112\cdots j\rangle$ for $j=1,2,\ldots,k-1$ . The automaton $Au(12\cdots k,1,k)$ can be graphically represented as follows:

$\langle\epsilon\rangle$$\langle 1\rangle$$\langle 12\rangle$$\langle 11\rangle$$\langle 12\cdots(k-1)\rangle$$\langle 112\cdots(k-2)\rangle$$\langle 12\cdots k\rangle$$\langle 112\cdots(k-1)\rangle$$1$$2$$1$$3$$2$$2$$k-3$$k-2$$\cdots$$\cdots$$k-2$$k-3$$k-1$$k-2$ ........................ $2,3,\ldots,k$ ........................ $3,4,\ldots,k$ ........................ $1,3,4,\ldots,k$ ........................ $1,4,5,\ldots,k$ ........................ $1,2,\ldots,k-3,k$ ........................ $1,2,\ldots,k-3,k-1,k$ ........................ $1,2,\ldots,k-1$ ........................ $1,2,\ldots,k-1$

Hence $T(12\cdots k,1,k)$ is given by the matrix $(b_{ij})_{1\leq i,j\leq 2k}$ with $a_{11}=k-1$ , $a_{12}=1$ , $a_{22}=k-2$ , $a_{23}=a_{2,4}=1$ , $a_{2i,2i}=k-1,$ $a_{2i,2i+2}=1,$ $a_{2i-1,2i-1}=k-2,$ and $a_{2i-1,2i+1}=a_{2i-1,2i+2}=1$ for all $i=2,3,\ldots,k-1$ , $a_{2k,2k}=a_{2k-1,2k-1}=k-1$ , and the remaining entries equal to zero. Let $C=I-xT(12\cdots k,1,k)$ . In view of (6), we are interested in computing $e_{1}^{t}C^{-1}(e_{1}+\cdots+e_{2k})$ . First, we solve the system $C{\mathfrak{z}}=e_{1}+\cdots+e_{2k}$ , where ${\mathfrak{z}}={\mathfrak{z}}(x)$ is the vector ${\mathfrak{z}}=({\mathfrak{z}}_{1},\ldots,{\mathfrak{z}}_{2k})^{t}$ . By induction,

[TABLE]

and

[TABLE]

for $j=1,2,\ldots,k-1$ . Hence,

[TABLE]

Taking in account the result in Example 2.5, we conclude that the generating function for the number of $k$ -ary words of length $n$ that contains $12\cdots k$ exactly once is given by

[TABLE]

Note that $\lim_{x\rightarrow 1/k}F_{1,k}^{12\cdots k}(x)=\frac{k((k+5)2^{k}+4)}{2^{k+1}}$ . Hence the minimal by absolute value pole of $F_{1,k}^{12\cdots k}(x)$ is $x=1/(k-1),$ and it is of order $k-2$ when $k\geq 6$ . Thus (see, for instance, [16] or [4]), as $n\to\infty,$

[TABLE]

For $k\leq 6$ we have:

[TABLE]

We refer to an edge of the associated graph starting and ending at the same state $\langle w\rangle$ as a loop at $\langle w\rangle.$ It is easy to see that the graph does not have any cycles, besides perhaps loops (cf. [20, p. 256]). Using similar arguments as in [9] (see Lemma 2.4 there), one can prove the following lemma.

Lemma 2.7.

Let $d$ be the number of distinct letters in $v.$ Then for any $\langle u\rangle\in E(v,r,k),$ the number of loops at $\langle u\rangle$ does not exceed $d-1$ . Moreover, there are exactly $d-1$ loops at $\langle\epsilon\rangle$ .

Recalling (3), the following lemma links the number of loops to the poles of the generating function $G_{k,n}^{v}(x),$ $x\in{\mathbb{C}},$ and hence to the asymptotic behavior of the sequence $g_{r}^{v}(k,n)$ as $n$ tends to infinity. The result follows directly from the identity in (6) and the transfer-matrix method [32, Theorem 4.7.2]. Given a matrix $A,$ denote by $A^{(i,j)}$ the matrix with row $i$ and column $j$ deleted. We have:

Lemma 2.8.

Let $p=\#E(v,r,k)$ be the number of states in $Au(v,r,k)$ . Then the generating function $G_{k,n}^{v}(x)$ is given by

[TABLE]

where $\lambda_{i}$ is the number of loops at state $s_{i}$ , $T=T(v,r,k),$ and $B(x)$ is the matrix obtained by replacing the first column in $I-xT$ with a column of all ones.

2.3 Stanley-Wilf type limits

Throughout this section we assume that the number of distinct letters in the pattern $v\in[k]^{\ell},$ namely $d,$ is greater than one. An interesting consequence of the results in Lemma 2.7 and Lemma 2.8 is the following theorem, which is the main result of this section.

Recall $f_{r}^{v}(k,n)$ and $g_{r}^{v}(k,n)$ from (2).

Theorem 2.9.

Assume that $d>1.$ Then for all $r\in{\mathbb{N}}_{0},$

[TABLE]

Proof.

By Lemma 2.8, the generating function $G_{r,k}^{v}(x)=\sum_{n\geq 0}g_{r}^{v}(k,n)x^{n}$ is a rational function in the complex plane ${\mathbb{C}}.$ By Lemma 2.7, the smallest pole of $G_{r,k}^{v}(x)$ is $\frac{1}{d-1}.$ Since the reciprocal of the smallest pole is the radius of convergence of the generating function [16], we have

[TABLE]

Since $f_{r}^{v}(k,n)\leq g_{r}^{v}(k,n)$ , we conclude that

[TABLE]

On the other hand, if $v\in[k]^{\ell}$ and a word $w\in[k]^{*}$ contains $v$ exactly $r$ times, then the concatenation $wu$ contains $v$ exactly $r$ times for any word $u\in[k]^{*}$ such that each letter of $u$ belongs to the set

[TABLE]

where $v_{\ell}$ is the rightmost letter of $v$ . Therefore, there exists a constant $c_{r}>0$ such that for all $n\in{\mathbb{N}},$

[TABLE]

Hence,

[TABLE]

which completes the proof of the theorem. ∎

Note that the limit in (9) is independent of $r.$ It turns out that a similar result holds for the occurrence enumeration problem in permutations; see Theorem 3.1 below. We remark that in the case of permutations, the structure of the dependence of the limit on the underlying pattern is considerably more complex than in (9) and is not yet completely understood [11, 17, 18]. The theorem has an interesting implication for the asymptotic behavior of the entropy of the random variable $X_{n}=occ_{v}(W_{n})$ with a random $W_{n},$ see Theorem 2.13 below for details.

A simple path in the graph representation of $Au(v,r,k)$ is a finite sequence of states $s_{j_{0}},\ldots,s_{j_{q}}$ in $E(v,r,k)$ such that $s_{i_{0}}=\langle\epsilon\rangle$ and for all $i=1,\ldots,q,$ we have $j_{i-1}<j_{i}$ and $s_{j_{i-1}}$ is connected to $s_{j_{i}}$ by a direct edge. The proof of the following partial refinement of Theorem 2.9 follows that of Theorem 3.2 in [9] nearly verbatim, and therefore is omitted.

Theorem 2.10.

Assume that $d>1.$ Let $M_{r}$ be the maximal number of states with $d-1$ loops in a simple path in $Au(v,r,k).$ Then for any $r\geq 0,$ there exists a constant $C_{r}\in(0,\infty)$ and $K_{r}\geq 0$ such that

[TABLE]

Note that $M_{r}\geq 1$ by Lemma 2.7. Through investigating various patterns with $d>1,$ we observed $K_{r}>0.$ Nevertheless, we believe that the following is true:

Conjecture.

There exist $k,r\in{\mathbb{N}}$ and a pattern $v\in[k]^{*}$ such that $d>1$ and $K_{r}$ in (10) is equal to zero. In that case, there exists $L_{r}\in{\mathbb{N}},$ $L_{r}<M_{r},$ and $\widetilde{K}_{r}\in(0,\infty)$ such that $\lim_{n\to\infty}\frac{f_{r}^{v}(k,n)}{n^{L_{r}}(d-1)^{n}}=\widetilde{K}_{r}.$

It follows from the first limit identity in (10) that $M_{r}$ is a non-decreasing function of $r.$ If the previous conjecture is true, then $M_{r}$ is not always strictly increasing. We believe that the following is true:

Conjecture.

For any $n,k\in{\mathbb{N}},$ and $v\in[k]^{*}$ with $d>1,$ $\lim_{r\to\infty}\frac{M_{r}}{r}$ exists and belongs to $(0,\infty).$

There exist $k\in{\mathbb{N}},$ a pattern $v\in[k]^{*}$ with $d>1,$ and an increasing sequence of integers $(r_{n})_{n\in{\mathbb{N}}},$ such that $M_{r_{n}}=M_{r_{n}-1}.$

We conclude this section with a remark that Theorems 2.9 and 2.10 can be interpreted as large deviation estimates for $occ_{v}(w)$ when $w\in[k]^{n}$ is chosen at random, see Section 2.5 below for details.

2.4 Weak pattern avoidance

In this section, we further investigate the asymptotic behavior of the sequence $(f_{r}^{v}(k,n))_{r\in{\mathbb{N}}_{0}}.$ It turns out that the generating function of this sequence, as defined by (3), can be linked to a natural concept of “weak avoidance” that may be of independent interest. The weak avoidance is defined in a fashion similar to the notion of the weakly self-avoiding random walks [5], namely by introducing a penalty for the non-avoidance rather than completely striking off the possibility of a pattern occurrence.

Formally speaking, for a pattern $v\in[k]^{*}$ , we associate a sequence of penalty functions $c^{v}_{k,n}:[0,1]\to[0,k^{n}],$ $n\in{\mathbb{N}},$ as follows:

[TABLE]

where

[TABLE]

It follows from (11) that

[TABLE]

Thus $c^{v}_{k,n}(x)=F_{k,n}^{v}(1-x).$ According to the definition in (11), the function $c^{v}_{k,n}(x)$ can be considered as a partition function counting the words in $[k]^{n}$ with weights penalizing occurrences of the pattern $v.$ Note that $c^{v}_{k,n}(x)$ is a decreasing function of $x,$ $c_{k,n}^{v}(0)=k^{n}$ counts all words without discrimination, and on the opposite extreme $c_{k,n}^{v}(1)=f_{0}^{v}(k,n)$ counts only words avoiding the pattern entirely. The parameter $x\in[0,1]$ can be therefore interpreted as an intensity or strength of the pattern avoidance.

The subsequent Section 2.5 is devoted to the study of the asymptotic behavior of the sequence $X_{n}=occ_{v}(W_{n}),$ $n\in{\mathbb{N}},$ where $W_{n}=w_{1},\cdots,w_{n}\in[k]^{n}$ and $w_{i}$ are i. i. d. random variables, each one distributed uniformly over $[k].$ The asymptotic behavior of random variables $X_{n}=occ_{v}(W_{n})$ in the case when the sequence $(w_{i})_{i\in{\mathbb{N}}}$ is drown at random from non-product probability measures on $[k]^{\mathbb{N}}$ is beyond the topic of this paper and will be studied by the authors elsewhere. The only exception in this paper is Theorem 2.19 where, following a canonical construction in the theory of self-avoiding random walks [5], we study $X_{n}$ in the case when $W_{n}$ is chosen at random according to the probability law

[TABLE]

Here $x$ is a parameter which ranges within the interval $[0,1].$ Clearly, ${\mathbb{Q}}^{v,x}_{k,n}(\,\cdot\,)$ is not uniform on $[k]^{n},$ it penalizes words $w$ with a non-zero $occ_{v}(w)$ by the factor $(1-x)^{occ_{v}(w)}$ which depends on the parameter $x\in(0,1).$ This probability measure belongs to a general class of Boltzmann distributions intensively studied in statistical mechanics and combinatorics, cf. [14]. In Theorem 2.19 we study ${\mathbb{Q}}^{v,x}_{k,n}$ in a certain small parameter regime where $x=x_{n}=o(1)$ decays fast, and consequently, ${\mathbb{Q}}^{v,x_{n}}_{k,n}$ can be considered as a perturbation of the uniform probability measure over $[k]^{n}.$

We conclude this section with an analogue of Theorem 2.9 for $c_{k,n}^{v}(x).$ It follows from Theorem 2.9 that for all $x\in[0,1],$

[TABLE]

where $d$ is the number of distinct letters in the pattern $v.$ We have:

Proposition 2.11.

Given a pattern $v\in[k]^{\ell},$ $\lim_{n\to\infty}(c_{k,n}^{v}(x))^{\frac{1}{n}}$ exists and lies within the closed interval $[d-1,k]$ for all $x\in[0,1].$

Proof.

By the definition, for any $x\in[0,1],$ $w\in[k]^{\geq\ell},$ and an increasing sequence of indices $j_{i},$ $1\leq i\leq\ell,$ we have

[TABLE]

Therefore, for any $n,m\in{\mathbb{N}}$ and $x\in[0,1],$

[TABLE]

Hence $\log c_{k,n}^{v}(x),$ $n\in{\mathbb{N}},$ is a subadditive sequence, and the claim of the proposition follows from Fekete’s subadditive lemma and the estimates in (15). ∎

Example 2.12.

Let us consider $v=21.$ In order to avoid the pattern $v$ , the letters of a word $w\in[k]^{n}$ must be arranged in the non-decreasing order. Therefore, $f^{21}_{0}(k,n)=\binom{n+k-1}{k-1},$ the number of ways to write $n$ as a weak composition $n=a_{1}+\cdots+a_{k},$ where $a_{i}\geq 0$ represents the number of occurrences of the letter $i\in[k]$ in a $k$ -ary word of length $n.$ Furthermore, by Theorem 2.9, $\lim_{n\to\infty}\bigl{(}f^{21}_{r}(k,n)\bigr{)}^{1/n}=1$ for all integer $r\geq 0.$ Though a simple explicit expression for $f^{21}_{r}(k,n)$ is not known, a result on generating functions due to MacMahon (see, for instance, Theorem 3.6 in [1]) combined with (13) shows that for $x\in(0,1],$

[TABLE]

The first inequality in (16) follows readily from the fact that

[TABLE]

as long as $i\leq j+1.$ Combining (16) with the trivial inequality $c^{21}_{k,n}>f^{21}_{0}(k,n),$ we obtain that $\binom{n+k-1}{k-1}<c^{21}_{k,n}(x)<\frac{1}{x^{k-1}}\binom{n+k-1}{k-1}$ for all $x\in(0,1).$ Remark that a straightforward improvement of the lower bound for $c^{21}_{k,n}(x)$ is

[TABLE]

where $\lfloor a\rfloor$ denotes the integer part of $a\in{\mathbb{R}}.$ Combining this lower bound with (16), we obtain that for all $x\in(0,1),$

[TABLE]

where $\varphi(x)$ is the Euler generating function $\prod_{j=1}^{\infty}\frac{1}{1-x^{j}}.$ Notice that the lower and upper bounds in (17) match asymptotically when $x\to 1.$

2.5 Random words

Let $(w_{i})_{i\in{\mathbb{N}}}$ be a sequence of independent random variables, each distributed uniformly on $[k]$ and let $v\in[k]^{\ell}$ be a word pattern, $\ell\geq 2.$ Denote $W_{n}=w_{1}w_{2}\cdots w_{n}\in[k]^{n}$ for $n\in{\mathbb{N}},$ and let $W=w_{1}w_{2}\cdots$ be the infinite string compound from the successive letters in the sequence. In this section we study the asymptotic behavior of the random variable $X_{n}=occ_{v}(W_{n})$ . Note that for all $r\in{\mathbb{N}}_{0},$

[TABLE]

We start with a corollary to Theorem 2.9 that is concerned with the asymptotic behavior of the information entropy of $X_{n},$ when $n$ tends to infinity. Let

[TABLE]

be the entropy of the random variable $X_{n}.$ The following theorem shows that $H_{k,v}(n)$ grows linearly with $n$ and gives the exact rate of growth for an arbitrary pattern $v$ with $d>1.$

Theorem 2.13.

Assume that $d>1.$ Then,

[TABLE]

Proof.

We have

[TABLE]

Thus

[TABLE]

and the result follows from Theorem 2.9 and a discrete version of the bounded convergence theorem. ∎

Our next result is a central limit theorem for $X_{n}$ which asserts that, as $n$ tends to infinity, $X_{n}$ is highly concentrated at $E(X_{n})=\binom{n}{\ell}\binom{k}{d}\frac{1}{k^{\ell}}$ with standard deviation of order $\frac{1}{\sqrt{n}}E(X_{n}).$ The fact that, exactly as in the classical case of partial sums of i. i. d. variables, typical fluctuations of $X_{n}$ are of order $\frac{1}{\sqrt{n}}E(X_{n})$ will be often exploited in the rest of this section. The proof follows closely that of Theorem 2 in [8], a similar CLT for pattern occurrences in permutations. It is based on an application of a general CLT for dependent variables due to [22], and hence, it relies on an accurate estimation of $\mbox{VAR}(X_{n}).$ Given the variance estimate and a general result in [29], the CLT can be strengthen to a Berry-Esseen type result providing the classical $O(n^{-1/2})$ rate of convergence, see Corollary 2.16 below.

Theorem 2.14.

Let $\mu_{n}=E(X_{n})$ and $\sigma_{n}=\sqrt{\text{VAR}(X_{n})}.$ Then $\mu_{n}=\binom{n}{\ell}\binom{k}{d}\frac{1}{k^{\ell}},$ $\sigma_{n}=\Theta\bigl{(}\frac{\mu_{n}}{\sqrt{n}}\bigr{)},$ and $\frac{X_{n}-\mu_{n}}{\sigma_{n}}$ converges in distribution, as $n\to\infty,$ to a standard normal random variable.

Proof.

There are $\binom{n}{\ell}$ ways to choose $\ell$ indexes $j_{1}<\cdots j_{\ell}$ out of $n$ possibilities. We refer to these ordered $\ell$ -tuples as $\ell$ -subintervals of $[n].$ Enumerate these subintervals in an arbitrary manner, and let $I_{j},$ $j=1,\ldots,\binom{n}{\ell},$ denote the $j$ -th subinterval. Let $X_{n,j}$ be the indicator of the event that the pattern occurs at $j$ -th subinterval.

First, we will compute $E(X_{n}).$ Given that

[TABLE]

and $X_{n}=\sum_{j=1}^{\binom{n}{\ell}}X_{n,j},$ we have

[TABLE]

Next, we will estimate $\text{VAR}(X_{n}).$ To that end, we rewrite $X^{2}_{n}$ as follows

[TABLE]

where

[TABLE]

In what follows, we will adopt the proof strategy of [8] and estimate $E(A_{s})$ separately for different values of the parameter $s.$ For $s=0$ the exact value is

[TABLE]

where we used the fact that for two intervals $I_{j}$ and $I_{m}$ with no overlap

[TABLE]

If $A_{0}$ would be the only terms contributing to the variance of $X_{n},$ its entire contribution combined with the term $-\bigl{[}E(X_{n})\bigr{]}^{2}$ would amount to (cf. formulas (9) and (10) in [8])

[TABLE]

To finish the estimate on the variance we need to provide estimates on $A_{s}$ when $s\neq 0.$ More specifically, when $s=1$ we give an accurate estimate, and for $s\geq 2$ a crude estimate will suffice for our purpose. More specifically, we will show that $E(A_{s})=\Theta(n^{2\ell-s}),$ and while $E(A_{0})-[E(X_{n})]^{2}$ is negative, $E(A_{0})+E(A_{1})-[E(X_{n})]^{2}=\Theta(n^{2\ell-1})$ which gives the necessary estimate for the variance.

Case I: $s=1$ . Consider the sum of the terms $E(X_{n,i}X_{n,j})$ over the pairs of intervals that overlap exactly at one place. The summation of these terms is

[TABLE]

where

[TABLE]

with two words occupying the intervals $I_{j}$ and $I_{m}$ overlap over the $(i+1)$ -th letter of each, and $v_{i+1}$ being the $p$ -th highest letter (among the distinct possibilities $1,\ldots,d$ ) in the pattern $v.$ To obtain the lower bound for $D_{k,v}$ we will only consider the case when the common letter is the $(i+1)$ -th letter for some $i\in\{0,\ldots,\ell-1\}$ in both intervals. Once the joint location of $I_{j}$ and $I_{m}$ is chosen, we have in total $k^{2\ell-1}$ possibilities to choose the corresponding letters. We have to fill $2i$ locations before and and $2\ell-2-2i$ locations after the common letter. The term $\binom{2i}{i}\binom{2\ell-2-2i}{\ell-1-i}$ is the number of possibilities to designate $\ell-1$ of the remaining $2\ell-2$ locations to be occupied by letters of the interval $I_{m}.$ Assuming that for given $p$ and $t$ the common letter for $I_{m}$ and $I_{j}$ is $t\in[p,p+1,\ldots,k-(d-p)],$ we observe that we have $\binom{t-1}{p-1}\binom{k-t}{d-p}$ possibilities to choose $d$ distinct letters from $[k].$

We remark that

[TABLE]

where the inequality is obtained by enumerating the terms with $i=0$ and $i=\ell-1$ only.

Furthermore,

[TABLE]

where we used Cauchy-Schwartz inequality in the first inequality and a variation of the Chu-Vandermonde identity stated as

[TABLE]

This identity can be justified as follows: in order to choose $d$ distinct letters from $[k]$ we can first choose the $p$ -th largest element among those $d$ letters, call it $j,$ from the interval $[p,k-d+p],$ then $p-1$ letters from the interval $[1,j-1]$ and $d-p$ letters from the interval $[j+1,k].$ Collecting all the estimates together, we obtain that

[TABLE]

Case II: $s>1$ . Furthermore, extending (20) to

[TABLE]

where $D_{k,v}^{(i)}>0$ are strictly positive constants whose value depends on $k$ and $v$ only (but not on $n$ ).

Having in hand the above estimates for $E(A_{n})$ we can now evaluate the variance of $X_{n}.$ Taking into the account (19), (21), and (22), we obtain that

[TABLE]

where

[TABLE]

Finally, by virtue of (18), the following limit exists and is strictly positive:

[TABLE]

and therefore, the remainder of the proof is a straightforward application of Theorem 2 in [22] to the random variables $X_{n,i},$ and can be carried as in [8] verbatim. ∎

Remark 2.15.

A central limit theorem for multisets closely related to Theorem 2.14 can be found in [15], see also references therein for earlier versions. Let $a_{i,n}\geq 0$ represent the number of occurrences of the letter $i\in[k]$ in the random word $W_{n},$ and denote by $A_{n}$ the random vector $(a_{1,n},\ldots,a_{k,n}).$ The CLT for $W_{n}$ in [15] can be stated as a limit theorem for the random variable $\frac{X_{n}-\widetilde{\mu}_{n}}{\widetilde{\sigma}_{n}}$ under the conditional measure $P(\cdot\,|A_{n}).$ The main difference with Theorem 2.14 is that the scaling factors $\widetilde{\mu}_{n}=\widetilde{\mu}_{n}(A_{n})$ and $\widetilde{\sigma}_{n}=\widetilde{\sigma}_{n}(A_{n})$ are random in that they depend on the vector $A_{n}.$ The relation of Theorem 2.14 to the CLT in [15] thus resembles the one between the so called annealed (average) and quenched limit theorems in the theory of random motion in a random media, see, for instance, [35]. In particular, $\sigma_{n}^{2}=E\bigl{(}\widetilde{\sigma}_{n}^{2}\bigr{)}+\widehat{\sigma}_{n}^{2},$ where $\sigma_{n}^{2}$ is the “annealed” variance that appears in the statement of Theorem 2.14 whereas the term $\widehat{\sigma}_{n}^{2}$ describes fluctuations of the “random environment” $A_{n}.$

Our next result is a Berry-Esseen type bound for the convergence rate of the above CLT. The bound is a direct implication of Theorem 2.2 in [29], along with the estimates in (23), (24), and the following modification of (19):

[TABLE]

Here $\Delta_{n}$ is the number of random indicators $X_{n,i}$ that are independent of $X_{n,i^{*}},$ an indicator with a given index $1\leq i^{*}\leq\binom{n}{\ell}.$ Let $\Phi(x)=\sqrt{\frac{1}{2\pi}}\int_{-\infty}^{x}e^{-\frac{x^{2}}{2}}\,dx,$ $x\in{\mathbb{R}},$ denote the distribution function of the standard normal variable. We have:

Corollary 2.16.

In the notation of Theorem 2.14,

[TABLE]

Remark that the classical Berry-Esseen bound for the rate of convergence of the CLT for partial sums of i. i. d. random variables is of order $n^{-1/2},$ thus the above bound is asymptotically optimal up to a constant.

Theorem 2.14 implies a weak law of large numbers for $X_{n}$ and asserts that a typical deviation of $X_{n}$ from $E(X_{n})$ is of order $\frac{1}{\sqrt{n}}E(X_{n}).$ The main purpose of the following Chernoff type bounds is to estimate the probability of large deviations, namely the ones of the order of magnitude $E(X_{n}).$ The result is merely an instance of Corollary 2.6 in [23] formulated using the notation of Theorem 2.14.

Corollary 2.17.

For any $t\geq 0,$

[TABLE]

and

[TABLE]

where $\Delta_{n}$ is introduced in (26) and $K_{n}=\binom{n}{\ell}.$

We will now state a direct consequence of Theorem 2.14 in terms of the weak avoidance penalty function $c^{v}_{k,n}(x).$ Our main motivation for including this result is the subsequent Theorem 2.19. Recall the notation of Theorem 2.14.

Lemma 2.18.

Let $(\theta_{n})_{n\in{\mathbb{N}}}$ be a sequence of positive reals such that $\lim_{n\to\infty}\theta_{n}=+\infty$ and $\lim_{n\to\infty}\frac{\mu_{n}}{\theta_{n}}=\gamma$ for some $\gamma\in[0,+\infty).$ Then, the following holds for any constant $t\in{\mathbb{R}}:$

[TABLE]

The following holds for any constant $t\in{\mathbb{R}}:$

[TABLE]

where $J_{k,v}$ are strictly positive constants introduced in (25).

Proof.

Observe that all the expectations in the statement of the lemma are well-defined for all $t\in{\mathbb{R}}$ because $1\leq X_{n}\leq\binom{n}{l}.$ Let $s=e^{t}.$ We will use the parameter $s$ so defined in both parts, (a) and (b), of the proof.

We will consider separately two cases, $\gamma=0$ and $\gamma\in(0,\infty).$

Case I: $\gamma=0.$ Using the second-order Taylor series with the remainder in the Lagrange form

[TABLE]

we obtain:

[TABLE]

for some random (because of the dependence on $X_{n}$ ) $t_{n}^{*}\in[0,|t|].$ Note that in view of (18) and the condition $\lim_{n\to\infty}\frac{\mu_{n}}{\theta_{n}}<\infty,$ with probability one,

[TABLE]

for some (deterministic) constant $M_{k,v}(t)>0$ which depends on the parameters $k,v$ and $t.$ Furthermore, by Theorem 2.14, $E(X_{n}^{2})=\sigma_{n}^{2}+\mu_{n}^{2}\sim\mu_{n}^{2}.$ Therefore,

[TABLE]

Recall the constant $M_{k,v}(t)$ in (30). For any $r\in{\mathbb{N}},$ we have

[TABLE]

where we used the mean-value theorem applied to the function $f(y)=y^{r}$ in the first step and (29) in the second one. Since,

[TABLE]

we get (27) for $\gamma=0$ by utilizing (31).

Case II: $\gamma\in(0,\infty).$ In this case, (31) follows directly from the law of large numbers $X_{n}/\mu_{n}\Rightarrow 1$ in probability, as $n\to\infty,$ which is implied by Theorem 2.14. The rest of the proof of (27) is the same as in Case I.

By Theorem 2.14, for any $t\in{\mathbb{R}}$ we have:

[TABLE]

The convergence of the moment generating functions of $\frac{X_{n}-\mu_{n}}{\sigma_{n}}$ can be verified using, for instance, a general Theorem 3 in [26], it is also transparent from the proofs in [22]. It follows that

[TABLE]

and hence

[TABLE]

The last formula is an analogue of (31) in part (a) and plays a similar role, the remainder of the argument is similar to its counterpart in (a). ∎

Recall ${\mathbb{Q}}^{v,x}_{k,n}$ from (14) and let ${\mathbb{E}}^{v,x}_{k,n}$ denote the expectation with respect to ${\mathbb{Q}}^{v,x}_{k,n}.$ Then for any $z>0$ and $x\in(0,1)$ we have

[TABLE]

Two interesting regimes in this model arise when it is assumed that $x=x_{n}$ depends on $n$ and either $x_{n}=o(1)$ or $1-x_{n}=o(1).$ Both the regimes can be considered as a perturbation of a uniform distribution, over $S_{n}$ in the former case and over the pattern-avoiding set $\{w\in[k]^{n}:occ_{v}(w)=0\}$ in the latter. In the context of permutations, similar regimes for the particular case when the pattern is the inversion $21,$ were recently studied in [6, 19, 33]. In view of (32), Lemma 2.18 implies the following:

Theorem 2.19.

Let $(\theta_{n})_{n\in{\mathbb{N}}}$ and $(\rho_{n})_{n\in{\mathbb{N}}}$ be two sequences of positive reals such that $\lim_{n\to\infty}\theta_{n}=+\infty,$ $\lim_{n\to\infty}\frac{\mu_{n}}{\theta_{n}}=\lambda$ for some $\lambda\in[0,+\infty),$ and $\lim_{n\to\infty}\frac{\theta_{n}}{\rho_{n}}=\alpha$ for some $\alpha\in[0,+\infty).$ Then the following holds for any $t\in{\mathbb{R}}:$

[TABLE]

In particular, by virtue of (18),

[TABLE]

if $\lim_{n\to\infty}\frac{n^{\ell}}{\rho_{n}}\in[0,+\infty).$

The following holds for any $t\in{\mathbb{R}}$ and a sequence of positive reals $(\rho_{n})_{n\in{\mathbb{N}}}$ such that $\lim_{n\to\infty}\frac{n^{\ell}}{\rho_{n}\sqrt{n}}=\beta$ for some $\beta\in[0,+\infty):$

[TABLE]

where $J_{k,v}$ are strictly positive constants introduced in (25).

The following holds for any $t\in{\mathbb{R}}$ and a sequence of positive reals $(\rho_{n})_{n\in{\mathbb{N}}}$ such that

[TABLE]

for some $\gamma\in[0,+\infty):$

(i)

We have:

[TABLE]

(ii)

Let ${\mathbb{Q}}_{n}(r)={\mathbb{Q}}^{v,\frac{1}{\rho_{n}}}_{k,n}(X_{n}=r)$ and

[TABLE]

be the entropy of $X_{n}$ under the law ${\mathbb{Q}}^{v,\frac{1}{\rho_{n}}}_{k,n}.$ Then

[TABLE]

Proof.

For part (a), plug $x=\frac{1}{\rho_{n}}$ and $z=e^{t/\theta_{n}}$ into (32) and use (27). For part (b), substitute $z=e^{\frac{t\sqrt{n}}{n^{\ell}}}$ and use (28). Part (i) in (c) follows then from the bounded convergence theorem and (33) which implies that the distribution of $\frac{X_{n}}{n^{\ell}}$ under the law ${\mathbb{E}}^{v,\frac{1}{\rho_{n}}}_{k,n}$ converges to the degenerate distribution at $\frac{1}{k^{\ell}\ell!}\binom{k}{d}.$ Finally,

[TABLE]

which implies the claim in (ii) of part (c). Indeed, $\frac{1}{n}\sum_{r\geq 0}{\mathbb{P}}_{n}(r)\log f_{r}^{v}(k,n)$ converges to $\log(d-1)$ by Theorem 2.9 and a discrete version of the bounded convergence theorem, $\frac{1}{n}{\mathbb{E}}^{v,\frac{1}{\rho_{n}}}_{k,n}(X_{n})\log(1-\rho_{n}^{-1})\sim\frac{\mu_{n}}{\rho_{n}}$ by (33), and $\frac{1}{n}\log c_{k,n}^{v}(\rho_{n}^{-1})$ converges to $\log k$ by virtue of (27). The proof of the theorem is complete. ∎

The results in Theorem 2.19 shed some light on the asymptotic behavior of $X_{n}$ under ${\mathbb{Q}}^{v,x_{n}}_{k,n}$ for $x_{n}=o(1).$ More specifically, the corollary suggests that the intensity sequence $x_{n}=1/\rho_{n}$ with $\rho_{n}$ which is at least $\Theta(\mu_{n})$ yields a perturbative “light avoidance regime” in that the results in Lemma 2.18 and Theorem 2.19 formally correspond to their counterparts in the corollary with $\rho_{n}=+\infty.$ In particular, (33) shows that $\mu_{n}$ remains the proper scaling for $X_{n}$ for any $x_{n}$ in this regime, namely the distribution of $X_{n}/\mu_{n}$ under ${\mathbb{Q}}^{v,x_{n}}_{k,n}$ converges to that of the constant one as $n\to\infty.$ Furthermore, by the Gärtner-Ellis theorem [13], the result in (34) for moment generating functions implies Corollary 2.20 given below.

Corollary 2.20.

Let $\rho_{n}$ be as defined in the statement of part (b) of Theorem 2.19. Then the following holds for any Borel set $B\subset{\mathbb{R}}:$

[TABLE]

It is reasonable to expect that a large deviation principle for $X_{n}/n^{\ell}$ under ${\mathbb{Q}}^{v,\frac{1}{\rho_{n}}}_{k,n}$ holds with a finite rate function and with respect to the usual scaling sequence $n$ rather than $\sqrt{n}$ (in our context, cf. Corollary 2.17 where $\frac{\mu_{n}^{2}}{\Delta_{n}\mu_{n}}=\frac{\mu_{n}}{\Delta_{n}}=\Theta(n)$ ). However, proving such a result would be beyond the reach of methods we employed in this section.

We conclude the section with another corollary to Theorem 2.14, a limit theorem that concerns with a Poisson approximation of $X_{n}$ in the case when $k=k_{n}$ is a rapidly enough increasing function of $n.$ The result is an analogue for random words of [12, Theorem 3.1] for random permutations. The proof of the theorem relies on a Poisson approximation of the sum of random indicators $X_{n}=\sum_{i}X_{n,i}$ via a modification of the Chen-Stein method which is due to [3], and follows the bulk of the argument in [12]. Recall that the total variation distance $d_{TV}(X,Y)$ between two ${\mathbb{N}}_{0}$ -valued random variables $X$ and $Y$ is defined as

[TABLE]

The following summary of results in [3] suffices for our purpose (cf. Theorem 4.2 in [12]):

Theorem 2.21 ([3]).

Let $N\in{\mathbb{N}}$ and $(Y_{i})_{i\in[N]}$ be a collection of identically distributed (but possibly dependent) Bernoulli variables with $P(Y_{i}=1)=p\in(0,1)$ and $(Y_{i}=0)=1-p.$ For $i,j\in[N]$ let $p_{i,j}=E(Y_{i}Y_{j}).$ Set $Y=\sum_{i=1}^{N}Y_{i}$ and $\lambda=Np.$ For any $i\in[N]$ let $D_{i}\subset[N]$ be a set of indices such that

[TABLE]

where $\sigma_{i}$ is the $\sigma$ -algebra generated by $\{Y_{j}:j\in D_{i}\},$ and define

[TABLE]

Let $W$ be a Poisson random variable with parameter $\lambda,$ that is $P(W=r)=\frac{\lambda^{r}e^{-\lambda}}{r!},$ $r\in{\mathbb{N}}_{0}.$ Then,

[TABLE]

We will apply Theorem 2.21 with $Y_{i}=X_{n,i},$ where $X_{n,i}$ are indicators introduced in the course of the proof of Theorem 2.14 assuming that $k=k_{n}$ and $\ell=\ell_{n}.$ Note that under the conditions we impose,

[TABLE]

goes to zero as $n$ tends to infinity. We have:

Theorem 2.22.

Suppose that three sequences of natural numbers $(k_{n})_{n\in{\mathbb{N}}},$ $(\ell_{n})_{n\in{\mathbb{N}}},$ and $(d_{n})_{n\in{\mathbb{N}}}$ satisfy the following conditions:

(i)

$d_{n}\leq\ell_{n}$ * and $d_{n}\leq k_{n}$ for all $n\in{\mathbb{N}}.$ *

(ii)

$\delta:=\liminf_{n\to\infty}\frac{d_{n}}{\ell_{n}}>0.$ **

(iii)

There exist constants $A>0$ and $\beta>\frac{2}{2+\delta}$ such that $\ell_{n}\geq An^{\beta}$ for all $n\in{\mathbb{N}}.$

Consider an arbitrary sequence of patterns $v_{n}\in[k_{n}]^{\ell_{n}},$ $n\in{\mathbb{N}},$ with $d_{n}$ distinct letters used to form $v_{n}.$ Let $X_{n}=occ_{v_{n}}(W_{n}),$ where $W_{n}$ is drawn at random from $[k_{n}]^{n}.$ Then

[TABLE]

where $Q_{n}$ is a Poisson random variable with parameter $\mu_{n}.$ In particular,

[TABLE]

for any integer $r\geq 0.$

Remark 2.23.

We believe that the lower bound for $\beta$ in the statement of the theorem is an artifact of the proof and can be improved. In the most favorable to us case $\delta=1,$ the conditions of the theorem require $\beta>\frac{2}{3}.$ This is compared to the lower bound $\beta>\frac{1}{2}$ obtained in [12] for permutations.

Proof of Theorem 2.22.

Fix any $n\in{\mathbb{N}},$ and let $K_{n}=\binom{n}{\ell_{n}}$ and $p_{n}=E(X_{n,j})=\frac{1}{k_{n}^{\ell_{n}}}\binom{k_{n}}{d_{n}}$ for this particular value of $n.$ Note that $\mu_{n}=E(X_{n})=K_{n}p_{n}.$ Recall the intervals $I_{j}$ from the proof of Theorem 2.14, assuming that $k=k_{n}$ and $\ell=\ell_{n},$ define for $j\in[N],$

[TABLE]

Let $(\ell_{n}-i)\wedge d$ denote $\min\{\ell_{n}-i,d_{n}\}.$ Observe that if $I_{j}\cap I_{m}=i,$ then

[TABLE]

Therefore, for $b_{1}$ and $b_{2}$ introduced in (36) we have:

[TABLE]

where $\Delta_{n}$ is defined in (26), and

[TABLE]

Therefore,

[TABLE]

Since

[TABLE]

we obtain that

[TABLE]

where we used Vandermonde’s identity for the second term and change of variables $m=\ell_{n}-i$ for the third one. Since

[TABLE]

we obtain that

[TABLE]

where $\Lambda_{n}$ is a random variable with hypergeometric distribution, $P(\Lambda_{n}=m)=\frac{\binom{n-\ell_{n}}{m}\binom{\ell_{n}}{\ell_{n}-m}}{\binom{n}{\ell_{n}}}$ for $m=0,\ldots,\ell_{n}.$ By Hoeffding’s inequality for partial sums of bounded random variables,

[TABLE]

for any $\varepsilon>0.$ Thus for any given $\varepsilon>0$ and $n$ large enough,

[TABLE]

Therefore, for all an arbitrary $\varepsilon>0$ and all $n$ large enough,

[TABLE]

where $\Gamma(\,\cdot\,)$ is the gamma function. Finally, using Stirling’s formula we obtain that

[TABLE]

By the conditions of the theorem, $\delta=\liminf_{n\to\infty}\frac{d_{n}}{\ell_{n}}>0.$ Therefore, for any $\gamma\in(0,\delta)$ and $n$ large enough we have:

[TABLE]

The proof of the theorem is complete. ∎

3 Permutation patterns

In this section, we discuss an extension of some of our results about counting occurrences of a pattern in words to permutations. The section is divided into two subsections. Subsection 3.1 is devoted to Stanley-Wilf type limits for permutations, and Section 3.2 adapts the concept of weak avoidance to permutations. The main results of this section are Theorem 3.1 and Proposition 3.2. The latter is a counterpart of Proposition 2.11 and the former is a modification for permutations of Theorem 2.9. Extensions of the CLT-related results in Section 2.5 to random permutations are readily available due to the CLT for permutations proved by Bóna in [8]. This is briefly discussed in the concluding paragraph of Section 3.2, the details are left to the reader.

We begin with notation. Permutations are bijections from a set $[n]$ to itself. For $n\in{\mathbb{N}},$ let $S_{n}$ denote the symmetric group of order $n,$ the group of permutations of the integers in $[n]$ . Occasionally, when confusion is not likely to occur, we will identify permutations in $S_{n}$ with the words representing the image of the permutation. For instance, for permutations $\pi=\pi(1)\cdots\pi(n)\in S_{n}$ and $\nu=\nu(1)\cdots\nu(m)\in S_{m}$ we refer to the permutation

[TABLE]

as the concatenation of the permutations $\pi$ and $\nu.$

Fix any $k\in{\mathbb{N}}$ and $\xi\in S_{k}.$ We refer to $\xi$ as a pattern, it remains fixed throughout the rest of the paper. For a permutation $\pi\in S_{n}$ with $n\geq k,$ an occurrence of the pattern $\xi$ in $\pi$ is a sequence of $k$ indices $1\leq i_{1}<i_{2}<\dots<i_{k}\leq n$ such that the word $\pi(i_{1})\cdots\pi(i_{k})\in[n]^{k}$ is order-isomorphic to the word $\xi,$ that is

[TABLE]

For a permutation $\pi\in S_{n}$ with $n\geq k$ we denote by $occ_{\xi}(\pi)$ the number of occurrences of the pattern $\xi$ in $\pi.$ For example, if $\xi=12$ and $\pi=51324,$ then $13,$ $12,$ $14,$ $34,$ and $24$ are order-isomorphic to $12,$ and $occ_{\xi}(\pi)=5.$ If $occ_{\xi}(\pi)=m,$ we say that $\pi$ contains $\xi$ (exactly) $m$ times. For a given $r\in{\mathbb{N}}_{0},$ let $f_{r}^{\xi}(n)$ denote the number of permutations in $S_{n}$ that contain $\xi$ exactly $r$ times. That is,

[TABLE]

For example, if $\xi=12$ then $f_{0}^{\xi}(3)=1$ (only $321$ counts), $f_{1}^{\xi}(3)=2$ ( $312$ and $231$ count), $f_{2}^{\xi}(3)=2$ ( $132$ and $213$ count), and $f_{3}^{\xi}(3)=1$ (only $123$ counts).

As in Section 3, $a_{n}\sim b_{n},$ $a_{n}=O(b_{n})$ and $a_{n}=o(b_{n})$ for sequences $a_{n}$ and $b_{n}$ with elements that might depend on $k,r,\xi$ and other parameters, means that, respectively, $\lim_{n\to\infty}\frac{a_{n}}{b_{n}}=1,$ $\limsup_{n\to\infty}\bigl{|}\frac{a_{n}}{b_{n}}\bigr{|}<\infty,$ and $\lim_{n\to\infty}\frac{a_{n}}{b_{n}}=0$ for all feasible values of the parameters when the latter are fixed. The notation $a_{n}=\Theta(b_{n})$ is used to indicate that both $a_{n}=O(b_{n})$ and $b_{n}=O(a_{n})$ hold true.

3.1 Stanley-Wilf type limits

The celebrated Stanley-Wilf conjecture proved in [27] states that $\lim_{n\to\infty}\frac{1}{n}\log f_{0}^{\xi}(n)$ exists and belongs to $(0,\infty).$ For $\pi\in S_{n},$ let $Z_{n}=occ_{\xi}(\pi),$ where $\pi$ is a permutation chosen at random uniformly over $S_{n}.$ Notice that

[TABLE]

In the language of random permutations, the Stanley-Wilf limit is

[TABLE]

which yields the following weaker conclusion:

[TABLE]

Thus the limit can be interpreted in terms of the asymptotic behavior of $P(Z_{n}=0)$ as a local large deviation result with respect to the scaling sequence $n\log n.$ The probability $P(Z_{n}=0)$ is very small since according to the CLT obtain by Bóna in [8], $Z_{n}$ is tightly concentrated around $E(Z_{n})=\frac{1}{k!}\binom{n}{k}.$ The following theorem extends this large deviation result to $P(Z_{n}=r)$ with an arbitrary fixed $r\in{\mathbb{N}}.$

Theorem 3.1.

For any $r\in{\mathbb{N}},$ $\lim_{n\to\infty}(f_{r}^{\xi}(n))^{\frac{1}{n}}$ exists and is equal to $\lim_{n\to\infty}(f_{0}^{\xi}(n))^{\frac{1}{n}}.$

Proof.

The proof by induction on $r$ . By Corollary 2 in [27], $c:=\lim_{n\to\infty}\bigl{(}f_{0}^{\xi}(n)\bigr{)}^{\frac{1}{n}}$ exists and is finite. Assume that for some $m\in{\mathbb{N}}$ the claim holds for $r=0,1,\ldots,m-1.$ To complete the proof, we need to show that under this assumption it holds also for $r=m.$

To this end, let $\pi$ be an arbitrary permutation in $S_{n}$ that contains the pattern $\xi$ exactly $m$ times. By removing the leftmost letter in the leftmost occurrence of $\xi$ in $\pi$ and renaming the remaining letters, we obtain a permutation $\pi^{\prime}$ in $S_{n-1}$ that contains $\xi$ at most $m-1$ times. Thus,

[TABLE]

It follows that

[TABLE]

On the other hand, consider an arbitrary permutation $\pi\in S_{n}$ that contains $\xi$ exactly $m-1$ times and the concatenation $\pi^{\prime}=\pi\xi^{\prime}\in S_{n+k},$ where $\xi^{\prime}$ is obtained by adding $n$ to each letter in $\xi.$ For instance, if $n=5,$ $\pi=13542,$ and $\xi=12,$ then $\xi^{\prime}=67$ and $\pi^{\prime}=1354267.$ Without loss of generality, we may assume that the letter $k$ precedes $1$ in $\xi$ (the idea is borrowed from [2]). Because of this assumption, the new permutation $\pi^{\prime}$ contains $\xi$ exactly $m$ times. We can therefore conclude that $f_{m-1}^{\xi}(n)\leq f_{m}^{\xi}(n+k).$ This inequality along with the induction hypothesis imply that

[TABLE]

In view of (37), this completes the proof of the theorem. ∎

3.2 Weak avoidance of permutation patterns

Similarly to (11), with any pattern $\xi\in S^{k}$ one can associate a sequence of weak avoidance penalty functions $c^{\xi}_{n}:[0,1]\to[0,n!],$ $n\in{\mathbb{N}},$ by setting

[TABLE]

where

[TABLE]

Notice that $c^{\xi}_{n}(0)=n!$ and $c^{\xi}_{n}(1)=f_{0}^{\xi}(n)$ . Similarly to (13), we have

[TABLE]

For certain particular cases the polynomials $c^{v}_{n}(1-x),$ generating functions of the sequence $f_{r}^{\xi}(n),$ $n\in{\mathbb{N}},$ have been studied in [24, 28] through the analysis of certain recursive functional equations that they satisfy.

The analogue of the ${\mathbb{Q}}^{v,x}_{k,n}$ measure introduced in (14) is the probability measure ${\mathbb{P}}^{v,x}_{n}$ on $S_{n}$ defined by

[TABLE]

In the case of inversions, i. e. for $\xi=21,$ ${\mathbb{P}}^{\xi,x}_{n}$ is a Mallow’s distribution. Mallow’s permutations have been studied by several authors, see, for instance, recent [12, 19, 30] and references therein.

The next proposition establishes the existence of $\lim_{n\to\infty}\bigl{(}c_{n}^{x}(\xi)\bigr{)}^{1/n}.$ The proof is based on a standard sub-additivity argument, and follows the same line of argument as the one in [2]. Unfortunately, we were unable to verify that the limit is necessarily finite (cf. Proposition 2.11 together with (15) for words).

Proposition 3.2.

$\lim_{n\to\infty}\bigl{(}c^{\xi}_{n}(x)\bigr{)}^{\frac{1}{n}}$ * exists for all $x\in[0,1].$ *

Proof.

For $\pi\in S_{n}$ and $i,j\in{\mathbb{N}}$ such that $1\leq i<j\leq n,$ let

[TABLE]

That is $\pi_{i,j}\in[n]^{j-i+1}$ and $\pi_{i,j}(r)=\pi(i-1-r)-(i-1)$ for all $r\in[n].$ Further, for any $m,n\in{\mathbb{N}}$ such that $m\leq n$ let

[TABLE]

Note that $\pi\in S_{n}^{m}$ implies $\pi_{m+1,n}\in S_{n-m}.$ In other words,

[TABLE]

Without loss of generality, we can and will assume that $\xi^{-1}(k)<\xi^{-1}(1),$ that is $k$ appears before $1$ in $\xi.$ Under this assumption, we have

[TABLE]

In view of (41) and (42), for any $n,m\in{\mathbb{N}}$ and $x\in[0,1]$ we have

[TABLE]

Hence, $-\log c^{\xi}_{n}(x),$ $n\in{\mathbb{N}},$ is a subadditive sequence, and by Fekete’s subadditive lemma, $\lim_{n\to\infty}\bigl{(}c^{\xi}_{n}(x)\bigr{)}^{\frac{1}{n}}$ exists for all $x\in[0,1].$ ∎

Example 3.3.

Consider $\xi=21.$ Then the number of occurrences of $\xi$ in a permutation $\pi$ is the number of inversions in $\pi,$ and $f_{r}^{21}(n)$ are Mahonian numbers [7]. The identity in (40) together with Netto’s formula for the generating function of the sequence $\{f_{r}^{21}(n):r\geq 0\}$ (see, for instance, [7, p. 43] or [31, Seq A008302]) give $c_{n}^{21}(x)=\prod_{j=1}^{n}\frac{1-(1-x)^{j}}{x}.$ In particular, $\lim_{n\to\infty}\bigl{(}c_{n}^{21}(x)\bigr{)}^{1/n}=x^{-1}$ for all $x\neq 0$ . Note that $f_{0}^{21}(n)=1$ for all $n\in{\mathbb{N}},$ and hence by virtue of Theorem 3.1, $\lim_{n\to\infty}\bigl{(}f_{r}^{21}(n)\bigl{)}^{1/n}=1$ for all $r\in{\mathbb{N}}.$ Interestingly enough, in contrast to Example 2.12, the asymptotic behavior of $c_{n}^{21}(x_{n})$ for a sequence $x_{n}$ such that $x_{n}\sim 1$ as $n\to\infty,$ does depend on the rate of convergence of $x_{n}.$

Conjecture.

$\lim_{n\to\infty}\bigl{(}c^{\xi}_{n}(x)\bigr{)}^{\frac{1}{n}}<\infty$ * for all patterns $\xi\in\cup_{k}S_{k}$ and all $x\in(0,1).$ *

It is interesting to notice that while for words we have $c_{r}^{v}(k,n+m)\leq c_{r}^{v}(k,n)c_{r}^{v}(k,m),$ the opposite is true for permutations, namely $c_{r}^{\xi}(n+m)\geq c_{r}^{\xi}(n)c_{r}^{\xi}(m).$ The differences can be explained as follows. For words we have:

[TABLE]

and, since letters can be repeated in words, the conditional expectation is less than the unconditional one $E\bigl{[}(1-x)^{X_{m}}\bigr{]}.$ Indeed, any pattern occurrence in the first $n$ letters does not affect the last $m$ letters in $W_{n+m},$ but does increase the probability of having occurrences of the pattern spread over two intervals, $[1,n]$ and $[n+1,n=m].$ It turns out that with permutations, where letters cannot be re-used, the situation is different and the correlation between occurrences of the pattern in the beginning and continuation of a large permutation is negative in contrast to words.

We conclude with a remark concerning the extension of the results in Section 2.5 to permutations. The key elements in the proofs in Section 2.5 is the specific covariance structure (the dependence graph) of the indicators $X_{n,i}$ and the asymptotic relation $\frac{\mu_{n}}{\sigma_{n}}=\Theta(\sqrt{n})$ between the expectation and variance of $X_{n}.$ Bóna’s CLT for permutations [8] asserts that the key elements are similar for words and permutations, and thus enables one to carry over the proofs of Corollaries 2.16, 2.17, and 2.20, Lemma 2.18, and Theorem 2.22 to permutations nearly verbatim. We leave the details to the reader.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. E. Andrews, The Theory of Partitions . Reprint of the 1976 original, Cambridge University Press, 1998.
2[2] R. Arratia, On the Stanley-Wilf conjecture for the number of permutations avoiding a given pattern , Electron. J. Combin. 6 (1999), paper no. 1.
3[3] R. Arratia, L. Goldstein, and L. Gordon, Two moments suffice for Poisson approximations: the Chen-Stein method , Ann. Probab. 17 (1989), 9–25.
4[4] C. Banderier and M. Drmota, Formulae and asymptotics for coefficients of algebraic functions , Combin. Probab. Comput. 24 (2015), 1–53.
5[5] R. Bauerschmidt, H. Duminil-Copin, J. Goodman, and G. Slade, Lectures on self-avoiding walks . In D. Ellwood, C. Newman, V. Sidoravicius, and W. Werner (Eds), Probability and Statistical Physics in Two and More Dimensions , Clay Math. Proc. 15, pp. 395–467, Amer. Math. Soc., 2012.
6[6] N. Bhatnagar and R. Peled, Lengths of monotone subsequences in a Mallow’s permutation , Probab. Theory Related Fields 161 (2015), 719–780.
7[7] M. Bóna, Combinatorics of Permutations , Chapman & Hall/CRC, Boca Raton, 2004.
8[8] M. Bóna, The copies of any permutation pattern are asymptotically normal , 2007, available at https://arxiv.org/abs/0712.2792 .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Finite automata, probabilistic method, and occurrence enumeration of a pattern in words and permutations

Abstract

1 Introduction and main results

Theorem A**.**

Theorem B**.**

Theorem C**.**

Theorem D**.**

Theorem E**.**

Theorem F**.**

2 Pattern occurrences in words

2.1 Notation and settings

2.2 Finite automata and pattern occurrences

Lemma 2.1**.**

Proof.

Definition 2.2**.**

Example 2.3**.**

Example 2.4**.**

Example 2.5**.**

Example 2.6**.**

Lemma 2.7**.**

Lemma 2.8**.**

2.3 Stanley-Wilf type limits

Theorem 2.9**.**

Proof.

Theorem 2.10**.**

Conjecture**.**

Conjecture**.**

2.4 Weak pattern avoidance

Proposition 2.11**.**

Proof.

Example 2.12**.**

2.5 Random words

Theorem 2.13**.**

Proof.

Theorem 2.14**.**

Proof.

Remark 2.15**.**

Corollary 2.16**.**

Corollary 2.17**.**

Lemma 2.18**.**

Proof.

Theorem 2.19**.**

Proof.

Corollary 2.20**.**

Theorem 2.21** ([3]).**

Theorem 2.22**.**

Remark 2.23**.**

Proof of Theorem 2.22.

3 Permutation patterns

3.1 Stanley-Wilf type limits

Theorem 3.1**.**

Proof.

3.2 Weak avoidance of permutation patterns

Proposition 3.2**.**

Proof.

Example 3.3**.**

Conjecture**.**

Theorem A.

Theorem B.

Theorem C.

Theorem D.

Theorem E.

Theorem F.

Lemma 2.1.

Definition 2.2.

Example 2.3.

Example 2.4.

Example 2.5.

Example 2.6.

Lemma 2.7.

Lemma 2.8.

Theorem 2.9.

Theorem 2.10.

Conjecture.

Conjecture.

Proposition 2.11.

Example 2.12.

Theorem 2.13.

Theorem 2.14.

Remark 2.15.

Corollary 2.16.

Corollary 2.17.

Lemma 2.18.

Theorem 2.19.

Corollary 2.20.

Theorem 2.21 ([3]).

Theorem 2.22.

Remark 2.23.

Theorem 3.1.

Proposition 3.2.

Example 3.3.

Conjecture.