Normal Approximation for $U$- and $V$-statistics of a Stationary Absolutely Regular Sequence
Vladimir G. Mikhailov, Natalia M. Mezhennaya

TL;DR
This paper establishes conditions under which $U$- and $V$-statistics derived from stationary absolutely regular sequences are asymptotically normally distributed, extending classical results to dependent data with evolving distributions.
Contribution
It provides new sufficient conditions for the asymptotic normality of $U$- and $V$-statistics for dependent sequences with distributions depending on $n$, using an advanced dependency approach.
Findings
Sufficient conditions for asymptotic normality of $U$-statistics.
Extension of normal approximation results to dependent sequences.
Application of dependency approach by Janson, Mikhailov, Tikhomirova, and Chistyakov.
Abstract
Let be a stationary absolutely regular sequence of real random variables with the distribution dependent on the number~. The paper presents sufficient conditions for the asymptotic normality (for and common centering and normalization) of the distribution of the nonhomogeneous -statistic of order which is given on the sequence with a kernel also dependent on . The same results for -statistics also hold. To analyze sums of dependent random variables with rare strong dependencies, the proof uses the approach that was proposed by S.~Janson in 1988 and upgraded by V.~Mikhailov in 1991 and M.~Tikhomirova and V.~Chistyakov in 2015.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Normal Approximation for - and -statistics of a Stationary Absolutely Regular Sequence
Vladimir G. Mikhailov
[email protected] Steklov Mathematical Institute of Russian Academy of Sciences, Moscow, Russia
Natalia M. Mezhennaya
[email protected] Bauman Moscow State Technical University, Moscow, Russia
Abstract
Let be a stationary absolutely regular sequence of real random variables with the distribution dependent on the number . The paper presents sufficient conditions for the asymptotic normality (for and common centering and normalization) of the distribution of the nonhomogeneous -statistic of order which is given on the sequence with a kernel also dependent on . The same results for -statistics also hold. To analyze sums of dependent random variables with rare strong dependencies, the proof uses the approach that was proposed by S. Janson in 1988 and upgraded by V. Mikhailov in 1991 and M. Tikhomirova and V. Chistyakov in 2015.
AMS Subject Classification: 60F05, 05C90, 94C15
Key words: absolute regularity condition, characterizing graph, central limit theorem, dependency graph, -statistic, -statistic, stationary sequence
Introduction
The study of a special class of functionals of a sequence of random variables of the form
[TABLE]
which were called -statistics, began in the middle of the last century due to the investigation of the properties of sample characteristics (see [1] and the bibliography therein). The number is the order and symmetrical function is the kernel of -statistic. Examples of -statistics are sample moments, Gini’s mean difference, Spearman’s rank correlation, etc.
It is known that such variables as the number of repetitions and the number of repetitions of tuples [2]–[5], the number of pairs of -equivalent tuples [6]–[11], etc., in the random discrete sequence belong (up to of the factor before the sum) to a class of quantities of the form (1).
In the following decades, a large number of research papers appeared devoted to the asymptotic properties of -statistics of sequences of independent identically distributed random variables (see, e.g., the bibliography in [12]). In proving asymptotic normality in the case of increasing sums, W. Hoeffding [1] proposed a method for approximating the distribution of -statistic by distribution of the sum of specially constructed independent random variables. This approach, in different forms, is also used to study -statistics of sequences of random variables with conditions of weak or other dependence in the scheme of increasing sums (see, e.g., [13]–[16]).
The idea of the results of these papers is given by the following theorem of K. Yoshihara [13] which we present in a simplified form.
Let for every be a strictly stationary sequence of real random variables satisfying the absolute regularity condition [17, p. 3]
[TABLE]
where is the -algebra of events generated by the random variables .
Let
[TABLE]
where are independent copies of ,
[TABLE]
Theorem 1** (Theorem 1 of [13]).**
Let and there is a number such that
[TABLE]
Then, if holds, the distribution function of the random variable converges to the distribution function of the standard normal law.
The research paper [18](see also [19, 20]) was devoted to adaptation the results of K. Yoshihara to triangular array schemes. Sh. Khashimov in [18] considered the case of second-order -statistic whose kernel can change for . Again, the method of W. Hoeffding [1] was used.
The method of moments was no less promising for studying -statistics in the triangular array scheme for dependent random variables. Back in 1975, V. Mikhailov [21], using the direct application of this method, derived sufficient conditions for asymptotic normality for a special case of -statistics of a sequence of finitely dependent random variables in a triangular array scheme (let’s call it the wide triangular array scheme), where for changes are allowed both to the kernel and the distribution of the sequence (now in the notation we have to indicate dependence of the kernel and distribution on ).
A modern variation of the method of moments which was proposed by Svante Janson [22, 23] and upgraded by V. Mikhailov in [23] and by M. Tikhomirova and V. Chistyakov in [24] allows to obtain simpler and substantially more general sufficient conditions for the asymptotic normality of - and -statistics of any order of a sequence of random variables satisfying the absolute regularity condition in the wide triangular array scheme which we present in this paper. These results complement the results of K. Yoshihara [13] and V. Mikhailov [21].
It also should be noted that for problems related to tuples in a discrete random sequence (see, e.g., [8, 9, 25, 26]), the present results allow to consider the case of simultaneous consistent growth of the length of the random sequence and the length of the tuple to infinity. A separate work is supposed to be devoted to these applications.
1 Limit Theorems
Let for every be a strictly stationary sequence of real random variables (e.g., the joint probability distribution function of is equal to the joint probability distribution function of for all and all ) satisfying the absolute regularity condition (2).
Let be a bounded measurable function for every and
[TABLE]
The functionals called the nonhomogeneous -statistic and -statistic with the kernel are given by the formulas (the definitions are given in [1] or [12]):
[TABLE]
respectively (in contrast to the traditional definition (1), the factors and are omitted before the sums).
Let
[TABLE]
Theorem 2**.**
Let for every be a strictly stationary sequence of real random variables satisfying the absolute regularity condition (2). For let the distribution of , the measurable for every function family , the number and the other parameters marked by index vary so that exists such that for every natural number and all
[TABLE]
Then the moments and distribution function of the random variable converge to the moments and distribution function of the standard normal law.
A similar statement holds for -statistics.
Theorem 3**.**
Let for every be a strictly stationary sequence of real random variables satisfying the absolute regularity condition (2). For let the distribution of , the measurable for every function family , the number and the other parameters marked by index vary so that exists such that for every natural number and all
[TABLE]
Then the moments and distribution function of the random variable converge to the moments and distribution function of the standard normal law.
We consider a special case in which the absolute regularity coefficient (2) decreases faster than any degree of for .
Theorem 4**.**
Let for every be a strictly stationary sequence of real random variables satisfying the absolute regularity condition (2). Let the number be fixed starting at some value of and the joint distribution of the random variables and the measurable function vary so that
[TABLE]
where the positive function for . Then the moments and distribution function of the random variable converge to the moments and distribution function of the standard normal law.
Theorem 5**.**
Let for every be a strictly stationary sequence of real random variables satisfying the absolute regularity condition (2). Let the number be fixed starting at some value of and the joint distribution of the random variables and the measurable function vary so that
[TABLE]
where the positive function for . Then the moments and distribution function of the random variable converge to the moments and distribution function of the standard normal law.
2 On the Method of Proving of Limit Theorems
In 1988, Svante Janson [22] proposed a simple technique for deriving sufficient conditions for the asymptotic normality of bounded random variables with a joint distribution described by the dependency graph. Only one vertex in the dependency graph corresponds to each random variable , and these vertices are connected by a set of edges. The following condition is satisfied:
if and are two disjoint subsets of graph vertices such that no edge of the graph has one endpoint in and the other in , then the sets of random variables and are independent.
In [23], this approach was presented more generally and was subsequently used in numerous research studies devoted to the study of asymptotic distributions of functionals depending on a sequence of independent random variables ([4, 5, 10, 11]). Finally, in 2015, M. Tikhomirova and V. Chistyakov [24] proposed a modification of the method [22] and [23] which is applicable to the families of random variables with a complete dependency graph, but the majority of dependencies between the variables are weak.
Remark 1**.**
A year after [24], the paper [27] appeared on the site arXiv.org and was devoted to transferring the approach by [22] and [23] to the case in which the dependency graph is a complete graph, but the majority of dependencies are weak. In [27], the joint distribution of a set of quantities is described by a weighted dependency graph in which each edge is assigned a numerical characteristic (weight), which describes the degree of dependence between adjacent variables in a certain way. The form of asymptotic normality conditions in [27] resembles similar conditions of [23], but the values included in the conditions are now determined by the weighted dependency graph. The results of applying the conditions by [27] to specific problems are presented in [27] and [28].
We present the main result of M. Tikhomirova and V. Chistyakov [24]. We assume that the joint distribution of the variables , , is determined by the characterizing graph . This is an undirected graph with the set of vertices and the following properties:
-
if the random variables and are dependent, then the vertices and are connected by an edge (in particular, the graph contains loops at all vertices);
-
for any natural number any subset , , and any of its partition (i.e., , ) such that there are no edges connecting vertices from with vertices from , there is a number such that
[TABLE]
For any subset we define its set of strong dependencies as the set of those vertices of that are connected by the edges with vertices from the set . Let be -algebra generated by the random variables , and
[TABLE]
We put
[TABLE]
We suppose that the joint distribution of the random variables depends on the natural number assumed as a parameter. All characteristics mentioned above also depend on : , , , , etc.
Theorem 6** (Theorem 1 from [24]).**
For let the numbers the joint distribution of the random variables and the other parameters marked by index vary so that exists such that for all and any natural number
[TABLE]
Then the moments and distribution function of the random variable converge to the moments and distribution function of the standard normal law.
3 Proofs of Theorems 2 and 4 for -statistics
The number of summands in (3) is (we recall that denotes Binomial coefficient). We construct a characterizing graph for the family of random variables as follows. We define some positive integer number and denote the characterizing graph by .
Each variable is one-to-one assigned to the vertex in the graph . We denote the set of vertices by . The total number of vertices in the graph is .
The set of edges is defined as follows:
-
the graph contains the loop at every vertex;
-
the vertices and , are connected by the edge from if and only if at least one of inequalities holds:
[TABLE]
We show that satisfies the property (7).
Lemma 1**.**
For every let be a strictly stationary sequence of real random variables satisfying the absolute regularity condition (2). Then for any set and its partitions such that where there are no edges with one endpoint in and the other endpoint in
[TABLE]
The inequality (11) shows that the condition (7) is satisfied for the graph for every and
[TABLE]
We need the following statement to derive Lemma 1.
Let , , and , , be the sets of natural numbers, and and be measurable functions of and real variables, respectively.
Lemma 2**.**
Let the strictly stationary sequence of random variables satisfy the absolute regularity condition (2) with the coefficient , , and
[TABLE]
Then
[TABLE]
Proof of Lemma 2.
The sets and can be one-to-one split into disjoint subsets and , where , , satisfying one of the following two conditions:
[TABLE]
We put
[TABLE]
Then, we can write
[TABLE]
According to the order (15), (16), we denote and by combined notation . In the first case , and in the second case Therefore, we use the notation , . It follows from the definitions that
[TABLE]
Let , , be mutually independent copies of the random variables , , , , , .
We use the notation
[TABLE]
and
[TABLE]
where is a permutation of the numbers such that
[TABLE]
We use one result of K. Yoshihara (see [13], Lemma 1). It refers to a more general case and, in particular, instead of the boundedness of the function it requires the property for some .
In our case, for every . Thus, Yoshihara’s lemma gives that
[TABLE]
Analogously, we derive the inequality
[TABLE]
for the function h_{\sigma}\bigl{(}x(H_{1}),X(H_{2}),X(H_{3}),\ldots,X(H_{s_{1}+s_{2}})\bigr{)}, where . It follows that
[TABLE]
We carry out similar estimates for other differences. The last among them is
[TABLE]
Summarizing the obtained estimates and using the triangle inequality, we obtain
[TABLE]
[TABLE]
It follows from (18), (19), (20), (21) that
[TABLE]
Analogously (21), we derive the inequalities
[TABLE]
It follows from these inequalities, the inequalities
[TABLE]
and the triangle inequality that
[TABLE]
The formulas (22), (23) and the triangle inequality give (14). ∎
Proof of Lemma 1.
Lemma 1 is immediate from Lemma 2, if we consider that and . Thus, (14) leads to (11). ∎
Proof of Theorem 2.
We estimate the quantities in the condition (9) for our case. We begin with
[TABLE]
where we recall .
The set of strong dependencies for the vertex in the graph is the set of vertices for which at least one of the inequalities (10) holds. The number of the elements of this set satisfies the inequality
[TABLE]
The set of strong dependencies for the set is given by the formula
[TABLE]
and satisfies the inequality
[TABLE]
From formulas (26), and (24), we have
[TABLE]
We note that, in the case under consideration,
[TABLE]
The use of (11), (27), (28), equality (12), and the above cited Theorem 1 from [24] leads to the following condition of the asymptotic normality for in the triangular array scheme: for every natural number and all
[TABLE]
∎
Proof of Theorem 4.
Let the number be independent of , , where (), and , where . In this case (5) follows from the formula:
[TABLE]
Let us examine this. We put
[TABLE]
then, the expression in the left side (29) for and can be estimated by . ∎
4 Remarks About Proofs of Theorems 3 and 5 for -Statistics
The proofs of Theorems 2 and 4 for -statistics are completely the same, with the only difference that the graph contains vertices, and in the formulas (25), (26), (27), and (28) the binomial coefficients must be replaced by The indicated replacement does not affect the form of the condition (5).
For example, consider the formula (25). Number of elements for which at least one of the inequalities (10) holds, can be estimated as follows. First, we select the indices and , there are elements satisfying (10) for each such pair, and the rest of the elements can be any. Thus,
[TABLE]
The remaining calculations are carried out similarly.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] W. Hoeffding. A class of statistics with asymptotically normal distribution, Ann. Math. Statist. , 19:3 (1948), 293–325.
- 2[2] A. M. Zubkov, V. G. Mikhailov. Limit distributions of random variables connected with long duplications in a sequence of independent trials. Teor. Veroyatnost. i Primenen. , 19:1 (1974), 173–181; Theory Probab. Appl. , 19:1 (1974), 172–179.
- 3[3] A. M. Zubkov, V. G. Mikhailov. On the repetitions of s 𝑠 s -tuples in a sequence of independent trials. Teor. Veroyatnost. i Primenen. , 24:2 (1979), 267–279; Theory Probab. Appl. , 24:2 (1979), 269–282.
- 4[4] M. I. Tikhomirova. Limit distributions of the number of absent chains of identical outcomes, Diskr. Mat. , 20:3 (2008), 40–46; Discrete Math. Appl. , 18:3 (2008), 293–300.
- 5[5] M. I. Tikhomirova. Asymptotic normality of the number of absent noncontinuous chains of outcomes of independent trials, Diskr. Mat. , 21:2 (2009), 112–125; Discrete Math. Appl. , 19:3 (2009), 293–308.
- 6[6] V. G. Mikhailov. On the asymptotic properties of the distribution of the number of pairs of H 𝐻 H -connected chains. Diskr. Mat. , 14:3 (2002), 122–129; Discrete Math. Appl. , 12:4 (2002), 393–400.
- 7[7] A. M. Shoitov. Limit distributions of the number of sets of H 𝐻 H -equivalent segments in an equiprobable polynomial scheme of arrays. Diskr. Mat. , 14:1 (2002), 82–98; Discrete Math. Appl. , 12:2 (2002), 165–181.
- 8[8] V. G. Mikhailov, A. M. Shoitov. Structural equivalence of s 𝑠 s -tuples in random discrete sequences. Diskr. Mat. , 15:4 (2003), 7–34; Discrete Math. Appl. , 13:6 (2003), 541–568.
