Finite automata, probabilistic method, and occurrence enumeration of a pattern in words and permutations
Toufik Mansour, Reza Rastegar, Alexander Roitershtein

TL;DR
This paper investigates the asymptotic enumeration and probabilistic distribution of pattern occurrences in words and permutations, establishing limit theorems and introducing weak avoidance concepts linked to non-product measures.
Contribution
It provides new asymptotic results for pattern occurrence counts, extends limit theorems to permutations, and introduces a novel weak avoidance framework with perturbation analysis.
Findings
Stanley-Wilf sequence converges to a limit independent of occurrence count
Established CLT and large deviation principles for pattern occurrences
Extended results from words to permutations
Abstract
The main theme of this paper is the enumeration of the occurrence of a pattern in words and permutations. We mainly focus on asymptotic properties of the sequence the number of -array -ary words that contain a given pattern exactly times. In addition, we study the asymptotic behavior of the random variable the number of pattern occurrences in a random -array word. The two topics are closely related through the identity In particular, we show that for any the Stanley-Wilf sequence converges to a limit independent of and determine the value of the limit. We then obtain several limit theorems for the distribution of including a CLT, large deviation estimates, and the exact growth rate of the entropy of Furthermore, we introduce a concept of weak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Finite automata, probabilistic method, and occurrence enumeration of a pattern in words and permutations
Toufik Mansour Department of Mathematics, University of Haifa, 199 Abba Khoushy Ave, 3498838 Haifa, Israel;
e-mail: [email protected]
Reza Rastegar Occidental Petroleum Corporation, Houston, TX 77046 and Departments of Mathematics and Petroleum Engineering, University of Tulsa, OK 74104, USA - Adjunct Professor; e-mail: [email protected]
Alexander Roitershtein Department of Statistics, Texas A&M University, College Station, TX 77843, USA;
e-mail: [email protected]
Abstract
The main theme of this paper is the enumeration of the occurrence of a pattern in words and permutations. We mainly focus on asymptotic properties of the sequence the number of -array -ary words that contain a given pattern exactly times. In addition, we study the asymptotic behavior of the random variable the number of pattern occurrences in a random -array word. The two topics are closely related through the identity In particular, we show that for any the Stanley-Wilf sequence \bigl{(}f_{r}^{v}(k,n)\bigr{)}^{1/n} converges to a limit independent of and determine the value of the limit. We then obtain several limit theorems for the distribution of including a CLT, large deviation estimates, and the exact growth rate of the entropy of Furthermore, we introduce a concept of weak avoidance and link it to a certain family of non-product measures on words that penalize pattern occurrences but do not forbid them entirely. We analyze this family of probability measures in a small parameter regime, where the distributions can be understood as a perturbation of a uniform measure. Finally, we extend some of our results for words, including the one regarding the equivalence of the limits of the Stanley-Wilf sequences, to pattern occurrences in permutations.
*MSC2010: * Primary 05A05, 05A15; Secondary 05A16, 68Q45, 60C05.
Keywords: pattern occurrences, weak avoidance, finite automata, random words, Stanley-Wilf type limits, limit theorems.
1 Introduction and main results
Pattern occurrence enumeration is a central topic in modern combinatorics, see for instance the monographs [8, 16, 20, 25]. In this paper, we are primarily concerned with pattern occurrence problem for words, however, we provide the extension of certain results in the context of permutations. We define words as finite arrays of letters from an alphabet for some given A pattern is any distinguished word, and occurrence of a pattern in a word is a subsequence of letters in (not necessarily consecutive) that are in the same relative order as the letters in For instance, the word has four occurrences of the pattern namely and See Subsection 2.1 for a more formal introduction of the concept. Occurrences of patterns in permutations are defined similarly, see the beginning of Section 3 for details.
Suppose that the alphabet and a pattern are given, and that exactly distinct letters are used to form the pattern . For instance, if and then and Our main object of interest is the frequency sequence namely the number of words in that contain the pattern exactly times. We also study the asymptotic behavior of the partial sums and the number of occurrences of in a random word distributed uniformly over . Remark that the distribution of the random variable is related to the sequences and through the identities
[TABLE]
The starting point of our study is the celebrated Stanley-Wilf conjecture which states that the number of permutations of size avoiding a pattern grows exponentially. The conjecture was settled by Marcus and Tardos [27] in 2004, see [11, 17, 25, 34] for a review of the history and recent developments in the field. The analogue of this result for the words is the convergence of the series . This was proved by Brändén and Mansour in [9] via a combinatorial analysis of certain finite automata that generate words avoiding a given pattern. In fact, it was shown in [9] that where is the number of distinct letters in the pattern In Section 2.2, we generalize this result to all Specifically, we show the following (as stated in Theorems 2.9 and 2.10):
Theorem A**.**
For any integer
[TABLE]
where is the number of distinct letters in the pattern
Assume that Then for any there exist a positive integer and real constants and such that
[TABLE]
We remark that in various examples with we are able to verify Nevertheless, we believe that it may be zero in some cases, see the discussion in Section 2.3.
We also give the following extension of this result for permutations. Let be a given permutation pattern of size and denote the number of permutations of size that contain exactly times, We have (Theorem 3.1 below):
Theorem B**.**
For any exists and is equal to
In contrast to the obtained results in the context of words, we cannot describe the exact structure of Wilf-Stanley type limits as a function of the parameters in a general form.
The next result turns out to be a direct implication of Theorem A. It is stated below as Theorem 2.13.
Theorem C**.**
If then, where is the entropy of
Loosely speaking, for a given , the entropy measures the amount of uncertainty in the value of the random variable Consequently, the entropy sequence is subadditive, namely because of the dependence of pattern occurrences each of other. The convergence of is thus ensured by Fekete’s subadditivity lemma. Theorem 2.13 then gives the precise value of this limit for an arbitrary pattern
In Sections 2.4 and 2.5 we study the asymptotic behavior of the sequence In Section 2.5 we obtain a central limit theorem and several related asymptotic results for the distribution of The following result is an analogue of the CLT for permutations obtained by Bóna in [8]. The bulk of the proof is an estimation of the variance of referred to as The latter, together with general theorems of [29] and [23], yields also a Berry-Esseen type bound for the rate of convergence and large deviation estimates stated, respectively, in Corollaries 2.16 and 2.17. The following is the content of Theorem 2.14.
Theorem D**.**
Let and Then \sigma_{n}\sim\bigl{(}\frac{\mu_{n}}{\sqrt{n}}\bigr{)}, and converges in distribution, as to a standard normal random variable.
For a pattern of length there are places in a word where the pattern might occur. Enumerate them in an arbitrary way, and let be the indicator of the event that the pattern occurs at the -th place in Choose a parameter and consider the following partition function penalizing the occurrences of
[TABLE]
Using this partition function, one can construct a Boltzmann distribution on as follows:
[TABLE]
The probability measure penalizes words with a non-zero with the factor but unless it doesn’t forbid them completely. We refer to a random word distributed according to as weakly avoiding the pattern The construction and the terminology are inspired by their analogue in the theory of self-avoiding walks, where a similar construction is used to penalize self-intersection of the path of a random walk and introduce weakly self-avoiding walks [5]. Similar construction for permutations is outlined in Section 3.2. In the case of permutations and the inversion pattern the above probability measure is a Mallow’s distribution. Mallow’s permutations have been studied by many authors, see, for instance, recent work [12, 19, 30] and references therein.
We remark that when the above results for hold under as is the uniform distribution over . One would then expect that for a sequence decaying to zero sufficiently fast, similar limit theorems hold for . Indeed, by using perturbation techniques we prove this the following (see Theorem 2.19):
Theorem E**.**
The following holds for any and a sequence of positive reals such that
\lim_{n\to\infty}{\mathbb{E}}^{v,\frac{1}{\rho_{n}}}_{k,n}(e^{\frac{tX_{n}}{n^{\ell}}})=\exp\Bigl{[}\frac{t}{k^{\ell}\ell!}\binom{k}{d}\Bigr{]}.**
* where are strictly positive constants.*
Let and be the entropy of under the law Then
Note that in the context of permutations, somewhat similar perturbative regimes for Mallow’s permutations were recently studied in [6, 19, 33].
Another interesting result closely related to Theorem D (Theorem 2.14 below) is a limit theorem dealing with a Poisson approximation of in the case when is a rapidly increasing function of The result is an analogue for random words of [12, Theorem 3.1] for random permutations, it is stated below as Theorem 2.22.
Theorem F**.**
Suppose that sequences of natural numbers and satisfy the following condition:
- There exist constants and such that for all where
Consider an arbitrary sequence of patterns with distinct letters used to form Let where is drawn at random from Then
[TABLE]
for any integer
The paper is structured as follows. Section 2 is devoted to pattern occurrences in words. The framework is formally introduced in Section 2.1. In Section 2.2 we study the sequences and The generating functions are explicitly computed for several examples using the automata approach and the transfer matrix method. The Stanley-Wilf limits of and are studied in Section 2.3. Section 2.4 is devoted to the study of words weakly avoiding a pattern. Section 2.5 contains various limit theorems for the distribution of the random variable Finally, within the framework of permutations the Stanley-Wilf type limits and words weakly avoiding a pattern are discussed in Section 3.
2 Pattern occurrences in words
In this section we focus on pattern occurrences in words and study the asymptotic behavior of and The section is divided into five subsections. We begin with notation. Section’s organization is discussed in more detail at the end of Section 2.1.
2.1 Notation and settings
Let and denote, respectively, the set of natural numbers and the set of non-negative integers, that is For a given set is the cardinality of For any given we denote the set by and refer to it as an alphabet and to its elements as letters. A word of length is an element of A language is the set of all words compound of letters in an alphabet We adopt the convention that where is an empty word. For any we denote by the union For instance, and We write a word in the form where is the -the letter of The concatenation of two words and is the word For instance, the concatenation of and is A pattern is any distinguished word in the underlying language
Let us now fix integers and a pattern in These parameters are considered to be given and fixed throughout the rest of Section 2. An important characteristic of the pattern turns out to be the number of distinct letters used to compound it. We will denote this number by For instance, if then and
For a word with an occurrence of the pattern in is a sequence of indices such that the subword is order-isomorphic to the word that is
[TABLE]
and
[TABLE]
For a word we denote by the number of occurrences of in For instance, if is the inversion and then (for the following three occurrences of pairs of letters which appear in the reverse order: and ). We say that a word contains the pattern exactly times, if For we denote by and the number of words in that contain respectively, exactly times and at most times. That is,
[TABLE]
We define their corresponding generating functions as
[TABLE]
We remark that given for is a polynomial in . Throughout this paper, and for sequences and with elements that might depend on and other parameters, means that, respectively, \limsup_{n\to\infty}\bigl{|}\frac{a_{n}}{b_{n}}\bigr{|}<\infty, and for all feasible values of the parameters when the latter are fixed. As usual, indicates that both and hold true.
The remainder of this section is divided into four subsections. In Section 2.2 we study a finite state automaton that generates words with a given value of The words are then counted trough an application of the transfer-matrix method, allowing us to evaluate and subsequently in several interesting cases. The results of Section 2.2 are then used in Section 2.3 to show that (see Theorem 2.9) for any
[TABLE]
where is the number of distinct letters in the pattern Theorem 2.9 is the main result of this paper. Remark that a similar result for permutations is given by Theorem 3.1 in Section 3.1. We refer to \lim_{n\to\infty}\bigl{(}f_{r}^{v}(k,n)\bigr{)}^{\frac{1}{n}} and their counterparts for permutations in Theorem 3.1 as Stanley-Wilf type limits.
Finally, Sections 2.4 and 2.5 deal with random words. Let be a permutation chosen at random from and In Section 2.5 we obtain a central limit theorem and several related asymptotic results for the distribution of The study of is, in principle, equivalent to the study of the sequences and in view of the identities (1). In Section 2.4 we introduce a notion of weak avoidance for an arbitrary word pattern. In Theorem 2.19 we obtain limit theorems for random words avoiding a pattern weakly. The distribution of is not uniform in this case, and we use the CLT for the uniform case and perturbation techniques to derive the results.
2.2 Finite automata and pattern occurrences
Given an integer , we define an equivalence relation on as follows. We say that two words and in are equivalent and write if the following condition holds for all
[TABLE]
For instance, if , and , then because and . On the other hand, because for any , and . We denote the equivalence class of a word by . For simplicity in notation, we drop the indexes when context is clear. We remark that:
and do not need to have the same length in order to be equivalent;
- -
if and then
The latter observation implies that there is a unique equivalence class such that
[TABLE]
Since the empty word is an element of the language it follows from (4) that if then
[TABLE]
In particular,
[TABLE]
The following lemma shows that the equivalence of any two words can be checked with a finite number of steps.
Lemma 2.1**.**
Let and be two words in Then if and only if (4) holds for all
Proof.
Let be an equivalence relation on such that if and only if (4) holds for all Clearly, implies . On the other hand, if then there exists such that and with and Without loss of generality we may assume that . The occurrences of in can use at most letters of Thus there is a subsequence of of length at most such that and , and hence . ∎
Let be the set of all equivalence classes of Note that by Lemma 2.1 the number of equivalence classes is finite. Recall from (5), and let
[TABLE]
denote the set of equivalence classes excluding By the definition,
[TABLE]
We next introduce the key tool in our proofs in this section.
Definition 2.2**.**
Given an integer we denote by a finite automaton [21] such that
- •
The set of states of the automaton is
- •
The input alphabet is
- •
Transition function is given by the rule
- •
The initial state is where denotes the empty word;
- •
All states are final states.
We identify the automaton with a (labeled) directed graph with vertices in such that there is a labeled edge from to if and only if .
Example 2.3**.**
Consider the case , and The set of equivalence classes is given by
[TABLE]
The labeled graph associated with the automaton is
\langle\epsilon\rangle$$\langle 1\rangle$$\langle 11\rangle$$\langle 112\rangle$$\langle 12\rangle$$\langle 123\rangle$$1$$1$$2$$2$$3$$2................................................................................................................................................
The automata serves for us as a bridge between the formal language theory and theory of computing on one side and the asymptotic theory of algebraic functions on the other. See, for instance, [4, 16] and references therein for background.
We exploit the link between asymptotic properties of rational functions and the structure of associated regular languages to study the generating functions and of the sequences and defined in (3), and subsequently the asymptotic behavior of these sequences, as tends to infinity. The class of automata has been introduced in [9]. Our results in this subsection (Lemmas 2.7 and 2.8 below) are extensions of the corresponding results in Section 2 of [9].
It is straightforward to verify (cf. [20, p. 256]) that one can order the states of the automaton as so that if then there is no path from the state to the state . Transition matrix of is the matrix with non-negative integer entries defined by
[TABLE]
Thus counts the number of edges between and , and is triangular. The following observation reduces the study of the sequence to the analysis of the matrix
[TABLE]
where and
Example 2.4**.**
Consider again the setup of Example 2.3, namely , and The transition matrix is given by
[TABLE]
Thus the generating function for the number of -ary words of length that contains at most once is given by
[TABLE]
where is the -th standard unit vector (all coordinates are zero, except that the -th coordinate is one). Note that the generating function for the number of -ary words of length that avoids is given by (see [10].) Therefore, by virtue of (8),
[TABLE]
Applying arguments similar to the one we used in order to get (8), we find that
[TABLE]
and
[TABLE]
Example 2.5**.**
The equivalence classes of are given by and , where . The can be graphically represented as follows:
\langle\epsilon\rangle$$\langle 1\rangle$$\langle 12\rangle$$\langle 12\cdots(k-2)\rangle$$\langle 12\cdots(k-1)\rangle$$1$$2$$3$$\cdots$$k-2$$k-1........................................................................................................................
Therefore, is given by the matrix with and for all and the remaining entries equal to zero. Consequently,
[TABLE]
Example 2.6**.**
It is not hard to see that the equivalence classes of are given by , for and for . The automaton can be graphically represented as follows:
\langle\epsilon\rangle$$\langle 1\rangle$$\langle 12\rangle$$\langle 11\rangle$$\langle 12\cdots(k-1)\rangle$$\langle 112\cdots(k-2)\rangle$$\langle 12\cdots k\rangle$$\langle 112\cdots(k-1)\rangle$$1$$2$$1$$3$$2$$2$$k-3$$k-2$$\cdots$$\cdots$$k-2$$k-3$$k-1$$k-2................................................................................................................................................................................................
Hence is given by the matrix with , , , , and for all , , and the remaining entries equal to zero. Let . In view of (6), we are interested in computing . First, we solve the system , where is the vector . By induction,
[TABLE]
and
[TABLE]
for . Hence,
[TABLE]
Taking in account the result in Example 2.5, we conclude that the generating function for the number of -ary words of length that contains exactly once is given by
[TABLE]
Note that . Hence the minimal by absolute value pole of is and it is of order when . Thus (see, for instance, [16] or [4]), as
[TABLE]
For we have:
[TABLE]
We refer to an edge of the associated graph starting and ending at the same state as a loop at It is easy to see that the graph does not have any cycles, besides perhaps loops (cf. [20, p. 256]). Using similar arguments as in [9] (see Lemma 2.4 there), one can prove the following lemma.
Lemma 2.7**.**
Let be the number of distinct letters in Then for any the number of loops at does not exceed . Moreover, there are exactly loops at .
Recalling (3), the following lemma links the number of loops to the poles of the generating function and hence to the asymptotic behavior of the sequence as tends to infinity. The result follows directly from the identity in (6) and the transfer-matrix method [32, Theorem 4.7.2]. Given a matrix denote by the matrix with row and column deleted. We have:
Lemma 2.8**.**
Let be the number of states in . Then the generating function is given by
[TABLE]
where is the number of loops at state , and is the matrix obtained by replacing the first column in with a column of all ones.
2.3 Stanley-Wilf type limits
Throughout this section we assume that the number of distinct letters in the pattern namely is greater than one. An interesting consequence of the results in Lemma 2.7 and Lemma 2.8 is the following theorem, which is the main result of this section.
Recall and from (2).
Theorem 2.9**.**
Assume that Then for all
[TABLE]
Proof.
By Lemma 2.8, the generating function is a rational function in the complex plane By Lemma 2.7, the smallest pole of is Since the reciprocal of the smallest pole is the radius of convergence of the generating function [16], we have
[TABLE]
Since , we conclude that
[TABLE]
On the other hand, if and a word contains exactly times, then the concatenation contains exactly times for any word such that each letter of belongs to the set
[TABLE]
where is the rightmost letter of . Therefore, there exists a constant such that for all
[TABLE]
Hence,
[TABLE]
which completes the proof of the theorem. ∎
Note that the limit in (9) is independent of It turns out that a similar result holds for the occurrence enumeration problem in permutations; see Theorem 3.1 below. We remark that in the case of permutations, the structure of the dependence of the limit on the underlying pattern is considerably more complex than in (9) and is not yet completely understood [11, 17, 18]. The theorem has an interesting implication for the asymptotic behavior of the entropy of the random variable with a random see Theorem 2.13 below for details.
A simple path in the graph representation of is a finite sequence of states in such that and for all we have and is connected to by a direct edge. The proof of the following partial refinement of Theorem 2.9 follows that of Theorem 3.2 in [9] nearly verbatim, and therefore is omitted.
Theorem 2.10**.**
Assume that Let be the maximal number of states with loops in a simple path in Then for any there exists a constant and such that
[TABLE]
Note that by Lemma 2.7. Through investigating various patterns with we observed Nevertheless, we believe that the following is true:
Conjecture**.**
There exist and a pattern such that and in (10) is equal to zero. In that case, there exists and such that
It follows from the first limit identity in (10) that is a non-decreasing function of If the previous conjecture is true, then is not always strictly increasing. We believe that the following is true:
Conjecture**.**
For any and with exists and belongs to
There exist a pattern with and an increasing sequence of integers such that
We conclude this section with a remark that Theorems 2.9 and 2.10 can be interpreted as large deviation estimates for when is chosen at random, see Section 2.5 below for details.
2.4 Weak pattern avoidance
In this section, we further investigate the asymptotic behavior of the sequence It turns out that the generating function of this sequence, as defined by (3), can be linked to a natural concept of “weak avoidance” that may be of independent interest. The weak avoidance is defined in a fashion similar to the notion of the weakly self-avoiding random walks [5], namely by introducing a penalty for the non-avoidance rather than completely striking off the possibility of a pattern occurrence.
Formally speaking, for a pattern , we associate a sequence of penalty functions as follows:
[TABLE]
where
[TABLE]
It follows from (11) that
[TABLE]
Thus According to the definition in (11), the function can be considered as a partition function counting the words in with weights penalizing occurrences of the pattern Note that is a decreasing function of counts all words without discrimination, and on the opposite extreme counts only words avoiding the pattern entirely. The parameter can be therefore interpreted as an intensity or strength of the pattern avoidance.
The subsequent Section 2.5 is devoted to the study of the asymptotic behavior of the sequence where and are i. i. d. random variables, each one distributed uniformly over The asymptotic behavior of random variables in the case when the sequence is drown at random from non-product probability measures on is beyond the topic of this paper and will be studied by the authors elsewhere. The only exception in this paper is Theorem 2.19 where, following a canonical construction in the theory of self-avoiding random walks [5], we study in the case when is chosen at random according to the probability law
[TABLE]
Here is a parameter which ranges within the interval Clearly, is not uniform on it penalizes words with a non-zero by the factor which depends on the parameter This probability measure belongs to a general class of Boltzmann distributions intensively studied in statistical mechanics and combinatorics, cf. [14]. In Theorem 2.19 we study in a certain small parameter regime where decays fast, and consequently, can be considered as a perturbation of the uniform probability measure over
We conclude this section with an analogue of Theorem 2.9 for It follows from Theorem 2.9 that for all
[TABLE]
where is the number of distinct letters in the pattern We have:
Proposition 2.11**.**
Given a pattern exists and lies within the closed interval for all
Proof.
By the definition, for any and an increasing sequence of indices we have
[TABLE]
Therefore, for any and
[TABLE]
Hence is a subadditive sequence, and the claim of the proposition follows from Fekete’s subadditive lemma and the estimates in (15). ∎
Example 2.12**.**
Let us consider In order to avoid the pattern , the letters of a word must be arranged in the non-decreasing order. Therefore, the number of ways to write as a weak composition where represents the number of occurrences of the letter in a -ary word of length Furthermore, by Theorem 2.9, \lim_{n\to\infty}\bigl{(}f^{21}_{r}(k,n)\bigr{)}^{1/n}=1 for all integer Though a simple explicit expression for is not known, a result on generating functions due to MacMahon (see, for instance, Theorem 3.6 in [1]) combined with (13) shows that for
[TABLE]
The first inequality in (16) follows readily from the fact that
[TABLE]
as long as Combining (16) with the trivial inequality we obtain that for all Remark that a straightforward improvement of the lower bound for is
[TABLE]
where denotes the integer part of Combining this lower bound with (16), we obtain that for all
[TABLE]
where is the Euler generating function Notice that the lower and upper bounds in (17) match asymptotically when
2.5 Random words
Let be a sequence of independent random variables, each distributed uniformly on and let be a word pattern, Denote for and let be the infinite string compound from the successive letters in the sequence. In this section we study the asymptotic behavior of the random variable . Note that for all
[TABLE]
We start with a corollary to Theorem 2.9 that is concerned with the asymptotic behavior of the information entropy of when tends to infinity. Let
[TABLE]
be the entropy of the random variable The following theorem shows that grows linearly with and gives the exact rate of growth for an arbitrary pattern with
Theorem 2.13**.**
Assume that Then,
[TABLE]
Proof.
We have
[TABLE]
Thus
[TABLE]
and the result follows from Theorem 2.9 and a discrete version of the bounded convergence theorem. ∎
Our next result is a central limit theorem for which asserts that, as tends to infinity, is highly concentrated at with standard deviation of order The fact that, exactly as in the classical case of partial sums of i. i. d. variables, typical fluctuations of are of order will be often exploited in the rest of this section. The proof follows closely that of Theorem 2 in [8], a similar CLT for pattern occurrences in permutations. It is based on an application of a general CLT for dependent variables due to [22], and hence, it relies on an accurate estimation of Given the variance estimate and a general result in [29], the CLT can be strengthen to a Berry-Esseen type result providing the classical rate of convergence, see Corollary 2.16 below.
Theorem 2.14**.**
Let and Then \sigma_{n}=\Theta\bigl{(}\frac{\mu_{n}}{\sqrt{n}}\bigr{)}, and converges in distribution, as to a standard normal random variable.
Proof.
There are ways to choose indexes out of possibilities. We refer to these ordered -tuples as -subintervals of Enumerate these subintervals in an arbitrary manner, and let denote the -th subinterval. Let be the indicator of the event that the pattern occurs at -th subinterval.
First, we will compute Given that
[TABLE]
and we have
[TABLE]
Next, we will estimate To that end, we rewrite as follows
[TABLE]
where
[TABLE]
In what follows, we will adopt the proof strategy of [8] and estimate separately for different values of the parameter For the exact value is
[TABLE]
where we used the fact that for two intervals and with no overlap
[TABLE]
If would be the only terms contributing to the variance of its entire contribution combined with the term -\bigl{[}E(X_{n})\bigr{]}^{2} would amount to (cf. formulas (9) and (10) in [8])
[TABLE]
To finish the estimate on the variance we need to provide estimates on when More specifically, when we give an accurate estimate, and for a crude estimate will suffice for our purpose. More specifically, we will show that and while is negative, which gives the necessary estimate for the variance.
Case I: . Consider the sum of the terms over the pairs of intervals that overlap exactly at one place. The summation of these terms is
[TABLE]
where
[TABLE]
with two words occupying the intervals and overlap over the -th letter of each, and being the -th highest letter (among the distinct possibilities ) in the pattern To obtain the lower bound for we will only consider the case when the common letter is the -th letter for some in both intervals. Once the joint location of and is chosen, we have in total possibilities to choose the corresponding letters. We have to fill locations before and and locations after the common letter. The term is the number of possibilities to designate of the remaining locations to be occupied by letters of the interval Assuming that for given and the common letter for and is we observe that we have possibilities to choose distinct letters from
We remark that
[TABLE]
where the inequality is obtained by enumerating the terms with and only.
Furthermore,
[TABLE]
where we used Cauchy-Schwartz inequality in the first inequality and a variation of the Chu-Vandermonde identity stated as
[TABLE]
This identity can be justified as follows: in order to choose distinct letters from we can first choose the -th largest element among those letters, call it from the interval then letters from the interval and letters from the interval Collecting all the estimates together, we obtain that
[TABLE]
Case II: . Furthermore, extending (20) to
[TABLE]
where are strictly positive constants whose value depends on and only (but not on ).
Having in hand the above estimates for we can now evaluate the variance of Taking into the account (19), (21), and (22), we obtain that
[TABLE]
where
[TABLE]
Finally, by virtue of (18), the following limit exists and is strictly positive:
[TABLE]
and therefore, the remainder of the proof is a straightforward application of Theorem 2 in [22] to the random variables and can be carried as in [8] verbatim. ∎
Remark 2.15**.**
A central limit theorem for multisets closely related to Theorem 2.14 can be found in [15], see also references therein for earlier versions. Let represent the number of occurrences of the letter in the random word and denote by the random vector The CLT for in [15] can be stated as a limit theorem for the random variable under the conditional measure The main difference with Theorem 2.14 is that the scaling factors and are random in that they depend on the vector The relation of Theorem 2.14 to the CLT in [15] thus resembles the one between the so called annealed (average) and quenched limit theorems in the theory of random motion in a random media, see, for instance, [35]. In particular, \sigma_{n}^{2}=E\bigl{(}\widetilde{\sigma}_{n}^{2}\bigr{)}+\widehat{\sigma}_{n}^{2}, where is the “annealed” variance that appears in the statement of Theorem 2.14 whereas the term describes fluctuations of the “random environment”
Our next result is a Berry-Esseen type bound for the convergence rate of the above CLT. The bound is a direct implication of Theorem 2.2 in [29], along with the estimates in (23), (24), and the following modification of (19):
[TABLE]
Here is the number of random indicators that are independent of an indicator with a given index Let denote the distribution function of the standard normal variable. We have:
Corollary 2.16**.**
In the notation of Theorem 2.14,
[TABLE]
Remark that the classical Berry-Esseen bound for the rate of convergence of the CLT for partial sums of i. i. d. random variables is of order thus the above bound is asymptotically optimal up to a constant.
Theorem 2.14 implies a weak law of large numbers for and asserts that a typical deviation of from is of order The main purpose of the following Chernoff type bounds is to estimate the probability of large deviations, namely the ones of the order of magnitude The result is merely an instance of Corollary 2.6 in [23] formulated using the notation of Theorem 2.14.
Corollary 2.17**.**
For any
[TABLE]
and
[TABLE]
where is introduced in (26) and
We will now state a direct consequence of Theorem 2.14 in terms of the weak avoidance penalty function Our main motivation for including this result is the subsequent Theorem 2.19. Recall the notation of Theorem 2.14.
Lemma 2.18**.**
Let be a sequence of positive reals such that and for some Then, the following holds for any constant
[TABLE]
The following holds for any constant
[TABLE]
where are strictly positive constants introduced in (25).
Proof.
Observe that all the expectations in the statement of the lemma are well-defined for all because Let We will use the parameter so defined in both parts, (a) and (b), of the proof.
We will consider separately two cases, and
Case I: Using the second-order Taylor series with the remainder in the Lagrange form
[TABLE]
we obtain:
[TABLE]
for some random (because of the dependence on ) Note that in view of (18) and the condition with probability one,
[TABLE]
for some (deterministic) constant which depends on the parameters and Furthermore, by Theorem 2.14, Therefore,
[TABLE]
Recall the constant in (30). For any we have
[TABLE]
where we used the mean-value theorem applied to the function in the first step and (29) in the second one. Since,
[TABLE]
we get (27) for by utilizing (31).
Case II: In this case, (31) follows directly from the law of large numbers in probability, as which is implied by Theorem 2.14. The rest of the proof of (27) is the same as in Case I.
By Theorem 2.14, for any we have:
[TABLE]
The convergence of the moment generating functions of can be verified using, for instance, a general Theorem 3 in [26], it is also transparent from the proofs in [22]. It follows that
[TABLE]
and hence
[TABLE]
The last formula is an analogue of (31) in part (a) and plays a similar role, the remainder of the argument is similar to its counterpart in (a). ∎
Recall from (14) and let denote the expectation with respect to Then for any and we have
[TABLE]
Two interesting regimes in this model arise when it is assumed that depends on and either or Both the regimes can be considered as a perturbation of a uniform distribution, over in the former case and over the pattern-avoiding set in the latter. In the context of permutations, similar regimes for the particular case when the pattern is the inversion were recently studied in [6, 19, 33]. In view of (32), Lemma 2.18 implies the following:
Theorem 2.19**.**
Let and be two sequences of positive reals such that for some and for some Then the following holds for any
[TABLE]
In particular, by virtue of (18),
[TABLE]
if
The following holds for any and a sequence of positive reals such that for some
[TABLE]
where are strictly positive constants introduced in (25).
The following holds for any and a sequence of positive reals such that
[TABLE]
for some
- (i)
We have:
[TABLE]
- (ii)
Let and
[TABLE]
be the entropy of under the law Then
[TABLE]
Proof.
For part (a), plug and into (32) and use (27). For part (b), substitute and use (28). Part (i) in (c) follows then from the bounded convergence theorem and (33) which implies that the distribution of under the law converges to the degenerate distribution at Finally,
[TABLE]
which implies the claim in (ii) of part (c). Indeed, converges to by Theorem 2.9 and a discrete version of the bounded convergence theorem, by (33), and converges to by virtue of (27). The proof of the theorem is complete. ∎
The results in Theorem 2.19 shed some light on the asymptotic behavior of under for More specifically, the corollary suggests that the intensity sequence with which is at least yields a perturbative “light avoidance regime” in that the results in Lemma 2.18 and Theorem 2.19 formally correspond to their counterparts in the corollary with In particular, (33) shows that remains the proper scaling for for any in this regime, namely the distribution of under converges to that of the constant one as Furthermore, by the Gärtner-Ellis theorem [13], the result in (34) for moment generating functions implies Corollary 2.20 given below.
Corollary 2.20**.**
Let be as defined in the statement of part (b) of Theorem 2.19. Then the following holds for any Borel set
[TABLE]
It is reasonable to expect that a large deviation principle for under holds with a finite rate function and with respect to the usual scaling sequence rather than (in our context, cf. Corollary 2.17 where ). However, proving such a result would be beyond the reach of methods we employed in this section.
We conclude the section with another corollary to Theorem 2.14, a limit theorem that concerns with a Poisson approximation of in the case when is a rapidly enough increasing function of The result is an analogue for random words of [12, Theorem 3.1] for random permutations. The proof of the theorem relies on a Poisson approximation of the sum of random indicators via a modification of the Chen-Stein method which is due to [3], and follows the bulk of the argument in [12]. Recall that the total variation distance between two -valued random variables and is defined as
[TABLE]
The following summary of results in [3] suffices for our purpose (cf. Theorem 4.2 in [12]):
Theorem 2.21** ([3]).**
Let and be a collection of identically distributed (but possibly dependent) Bernoulli variables with and For let Set and For any let be a set of indices such that
[TABLE]
where is the -algebra generated by and define
[TABLE]
Let be a Poisson random variable with parameter that is Then,
[TABLE]
We will apply Theorem 2.21 with where are indicators introduced in the course of the proof of Theorem 2.14 assuming that and Note that under the conditions we impose,
[TABLE]
goes to zero as tends to infinity. We have:
Theorem 2.22**.**
Suppose that three sequences of natural numbers and satisfy the following conditions:
- (i)
* and for all *
- (ii)
**
- (iii)
There exist constants and such that for all
Consider an arbitrary sequence of patterns with distinct letters used to form Let where is drawn at random from Then
[TABLE]
where is a Poisson random variable with parameter In particular,
[TABLE]
for any integer
Remark 2.23**.**
We believe that the lower bound for in the statement of the theorem is an artifact of the proof and can be improved. In the most favorable to us case the conditions of the theorem require This is compared to the lower bound obtained in [12] for permutations.
Proof of Theorem 2.22.
Fix any and let and for this particular value of Note that Recall the intervals from the proof of Theorem 2.14, assuming that and define for
[TABLE]
Let denote Observe that if then
[TABLE]
Therefore, for and introduced in (36) we have:
[TABLE]
where is defined in (26), and
[TABLE]
Therefore,
[TABLE]
Since
[TABLE]
we obtain that
[TABLE]
where we used Vandermonde’s identity for the second term and change of variables for the third one. Since
[TABLE]
we obtain that
[TABLE]
where is a random variable with hypergeometric distribution, for By Hoeffding’s inequality for partial sums of bounded random variables,
[TABLE]
for any Thus for any given and large enough,
[TABLE]
Therefore, for all an arbitrary and all large enough,
[TABLE]
where is the gamma function. Finally, using Stirling’s formula we obtain that
[TABLE]
By the conditions of the theorem, Therefore, for any and large enough we have:
[TABLE]
The proof of the theorem is complete. ∎
3 Permutation patterns
In this section, we discuss an extension of some of our results about counting occurrences of a pattern in words to permutations. The section is divided into two subsections. Subsection 3.1 is devoted to Stanley-Wilf type limits for permutations, and Section 3.2 adapts the concept of weak avoidance to permutations. The main results of this section are Theorem 3.1 and Proposition 3.2. The latter is a counterpart of Proposition 2.11 and the former is a modification for permutations of Theorem 2.9. Extensions of the CLT-related results in Section 2.5 to random permutations are readily available due to the CLT for permutations proved by Bóna in [8]. This is briefly discussed in the concluding paragraph of Section 3.2, the details are left to the reader.
We begin with notation. Permutations are bijections from a set to itself. For let denote the symmetric group of order the group of permutations of the integers in . Occasionally, when confusion is not likely to occur, we will identify permutations in with the words representing the image of the permutation. For instance, for permutations and we refer to the permutation
[TABLE]
as the concatenation of the permutations and
Fix any and We refer to as a pattern, it remains fixed throughout the rest of the paper. For a permutation with an occurrence of the pattern in is a sequence of indices such that the word is order-isomorphic to the word that is
[TABLE]
For a permutation with we denote by the number of occurrences of the pattern in For example, if and then and are order-isomorphic to and If we say that contains (exactly) times. For a given let denote the number of permutations in that contain exactly times. That is,
[TABLE]
For example, if then (only counts), ( and count), ( and count), and (only counts).
As in Section 3, and for sequences and with elements that might depend on and other parameters, means that, respectively, \limsup_{n\to\infty}\bigl{|}\frac{a_{n}}{b_{n}}\bigr{|}<\infty, and for all feasible values of the parameters when the latter are fixed. The notation is used to indicate that both and hold true.
3.1 Stanley-Wilf type limits
The celebrated Stanley-Wilf conjecture proved in [27] states that exists and belongs to For let where is a permutation chosen at random uniformly over Notice that
[TABLE]
In the language of random permutations, the Stanley-Wilf limit is
[TABLE]
which yields the following weaker conclusion:
[TABLE]
Thus the limit can be interpreted in terms of the asymptotic behavior of as a local large deviation result with respect to the scaling sequence The probability is very small since according to the CLT obtain by Bóna in [8], is tightly concentrated around The following theorem extends this large deviation result to with an arbitrary fixed
Theorem 3.1**.**
For any exists and is equal to
Proof.
The proof by induction on . By Corollary 2 in [27], c:=\lim_{n\to\infty}\bigl{(}f_{0}^{\xi}(n)\bigr{)}^{\frac{1}{n}} exists and is finite. Assume that for some the claim holds for To complete the proof, we need to show that under this assumption it holds also for
To this end, let be an arbitrary permutation in that contains the pattern exactly times. By removing the leftmost letter in the leftmost occurrence of in and renaming the remaining letters, we obtain a permutation in that contains at most times. Thus,
[TABLE]
It follows that
[TABLE]
On the other hand, consider an arbitrary permutation that contains exactly times and the concatenation where is obtained by adding to each letter in For instance, if and then and Without loss of generality, we may assume that the letter precedes in (the idea is borrowed from [2]). Because of this assumption, the new permutation contains exactly times. We can therefore conclude that This inequality along with the induction hypothesis imply that
[TABLE]
In view of (37), this completes the proof of the theorem. ∎
3.2 Weak avoidance of permutation patterns
Similarly to (11), with any pattern one can associate a sequence of weak avoidance penalty functions by setting
[TABLE]
where
[TABLE]
Notice that and . Similarly to (13), we have
[TABLE]
For certain particular cases the polynomials generating functions of the sequence have been studied in [24, 28] through the analysis of certain recursive functional equations that they satisfy.
The analogue of the measure introduced in (14) is the probability measure on defined by
[TABLE]
In the case of inversions, i. e. for is a Mallow’s distribution. Mallow’s permutations have been studied by several authors, see, for instance, recent [12, 19, 30] and references therein.
The next proposition establishes the existence of \lim_{n\to\infty}\bigl{(}c_{n}^{x}(\xi)\bigr{)}^{1/n}. The proof is based on a standard sub-additivity argument, and follows the same line of argument as the one in [2]. Unfortunately, we were unable to verify that the limit is necessarily finite (cf. Proposition 2.11 together with (15) for words).
Proposition 3.2**.**
\lim_{n\to\infty}\bigl{(}c^{\xi}_{n}(x)\bigr{)}^{\frac{1}{n}}* exists for all *
Proof.
For and such that let
[TABLE]
That is and for all Further, for any such that let
[TABLE]
Note that implies In other words,
[TABLE]
Without loss of generality, we can and will assume that that is appears before in Under this assumption, we have
[TABLE]
In view of (41) and (42), for any and we have
[TABLE]
Hence, is a subadditive sequence, and by Fekete’s subadditive lemma, \lim_{n\to\infty}\bigl{(}c^{\xi}_{n}(x)\bigr{)}^{\frac{1}{n}} exists for all ∎
Example 3.3**.**
Consider Then the number of occurrences of in a permutation is the number of inversions in and are Mahonian numbers [7]. The identity in (40) together with Netto’s formula for the generating function of the sequence (see, for instance, [7, p. 43] or [31, Seq A008302]) give In particular, \lim_{n\to\infty}\bigl{(}c_{n}^{21}(x)\bigr{)}^{1/n}=x^{-1} for all . Note that for all and hence by virtue of Theorem 3.1, \lim_{n\to\infty}\bigl{(}f_{r}^{21}(n)\bigl{)}^{1/n}=1 for all Interestingly enough, in contrast to Example 2.12, the asymptotic behavior of for a sequence such that as does depend on the rate of convergence of
Conjecture**.**
\lim_{n\to\infty}\bigl{(}c^{\xi}_{n}(x)\bigr{)}^{\frac{1}{n}}<\infty* for all patterns and all *
It is interesting to notice that while for words we have the opposite is true for permutations, namely The differences can be explained as follows. For words we have:
[TABLE]
and, since letters can be repeated in words, the conditional expectation is less than the unconditional one E\bigl{[}(1-x)^{X_{m}}\bigr{]}. Indeed, any pattern occurrence in the first letters does not affect the last letters in but does increase the probability of having occurrences of the pattern spread over two intervals, and It turns out that with permutations, where letters cannot be re-used, the situation is different and the correlation between occurrences of the pattern in the beginning and continuation of a large permutation is negative in contrast to words.
We conclude with a remark concerning the extension of the results in Section 2.5 to permutations. The key elements in the proofs in Section 2.5 is the specific covariance structure (the dependence graph) of the indicators and the asymptotic relation between the expectation and variance of Bóna’s CLT for permutations [8] asserts that the key elements are similar for words and permutations, and thus enables one to carry over the proofs of Corollaries 2.16, 2.17, and 2.20, Lemma 2.18, and Theorem 2.22 to permutations nearly verbatim. We leave the details to the reader.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. E. Andrews, The Theory of Partitions . Reprint of the 1976 original, Cambridge University Press, 1998.
- 2[2] R. Arratia, On the Stanley-Wilf conjecture for the number of permutations avoiding a given pattern , Electron. J. Combin. 6 (1999), paper no. 1.
- 3[3] R. Arratia, L. Goldstein, and L. Gordon, Two moments suffice for Poisson approximations: the Chen-Stein method , Ann. Probab. 17 (1989), 9–25.
- 4[4] C. Banderier and M. Drmota, Formulae and asymptotics for coefficients of algebraic functions , Combin. Probab. Comput. 24 (2015), 1–53.
- 5[5] R. Bauerschmidt, H. Duminil-Copin, J. Goodman, and G. Slade, Lectures on self-avoiding walks . In D. Ellwood, C. Newman, V. Sidoravicius, and W. Werner (Eds), Probability and Statistical Physics in Two and More Dimensions , Clay Math. Proc. 15, pp. 395–467, Amer. Math. Soc., 2012.
- 6[6] N. Bhatnagar and R. Peled, Lengths of monotone subsequences in a Mallow’s permutation , Probab. Theory Related Fields 161 (2015), 719–780.
- 7[7] M. Bóna, Combinatorics of Permutations , Chapman & Hall/CRC, Boca Raton, 2004.
- 8[8] M. Bóna, The copies of any permutation pattern are asymptotically normal , 2007, available at https://arxiv.org/abs/0712.2792 .
