A Bernstein Inequality For Exponentially Growing Graphs
Johannes T. N. Krebs

TL;DR
This paper introduces a Bernstein inequality tailored for sums of random variables on exponentially growing graphs, enabling better concentration bounds in highly-connected network structures.
Contribution
It provides a novel Bernstein inequality applicable to graphs with exponential node growth, aiding in statistical analysis of complex networks.
Findings
Derived a Bernstein inequality for exponential graphs
Facilitates concentration inequalities in highly-connected networks
Supports consistency analysis of nonparametric estimators
Abstract
In this article we present a Bernstein inequality for sums of random variables which are defined on a graphical network whose nodes grow at an exponential rate. The inequality can be used to derive concentration inequalities in highly-connected networks. It can be useful to obtain consistency properties for nonparametric estimators of conditional expectation functions which are derived from such networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Bernstein Inequality For Exponentially Growing Graphs111This research was supported by the Fraunhofer ITWM, 67663 Kaiserslautern, Germany which is part of the Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. The author thanks Hannes Christiansen for proofreading parts of the article.
Johannes T. N. Krebs222Department of Mathematics, University of Kaiserslautern, 67653 Kaiserslautern, Germany, email: [email protected]
Abstract
In this article we present a Bernstein inequality for sums of random variables which are defined on a graphical network whose nodes grow at an exponential rate. The inequality can be used to derive concentration inequalities in highly-connected networks. It can be useful to obtain consistency properties for nonparametric estimators of conditional expectation functions which are derived from such networks.
Keywords: Asymptotic inference; asymptotic inequalities; Bernstein inequality; Concentration inequality; Graphs; Highly-connected graphical networks; Mixing; Nonparametric statistics; Random fields; Stochastic processes
MSC 2010: Primary: 62G20, 62M40, 90B15; Secondary: 62G07, 62G08, 91D30
Inequalities of the Bernstein type are an important tool for the asymptotic analysis in probability theory and statistics. The original inequality derived by Bernstein (1927) gives bounds on , where for bounded random variables which are i.i.d. and have expectation zero. There are various versions of Bernstein’s inequality, e.g., Hoeffding (1963). In particular, generalizations to different kinds of stochastic processes have gained importance: Carbon (1983), Collomb (1984), Bryc and Dembo (1996) and Merlevède et al. (2009) provide extensions to times series which are weakly dependent. Valenzuela-Domínguez et al. (2017) give a further generalization to strong mixing random fields which are defined on the regular lattice for some lattice dimension . The corresponding definitions of dependence are given in Doukhan (1994) and in Bradley (2005).
Bernstein inequalities in particular find their applications when deriving large deviation results in nonparametric regression and density estimation, compare Györfi et al. (1989) and Györfi et al. (2002).
In this article we derive a new Bernstein inequality which adapts to highly-connected networks where the number of nodes grows at an exponential rate. A well-known example for such a graph is the internet map which tries to represent the internet with visual graphics. Another application may be nested simulations which are used in insurance mathematics to simulate the outcome of an insurance contract. Based on this new Bernstein inequality, we derive a concentration inequality which ensures that in simulations the nonparametric regression or density function estimator is consistent. It turns out that we need a somewhat stricter decay in the -mixing coefficients than it is usually assumed in the case for time series. Due to the special geometric structure of the underlying data, many technical aspects in the proofs of these new inequalities are much more involved than it is the case for time series data or for data which is defined on a lattice.
This paper is organized as follows: we give the motivation and the definitions in Section 1. Section 2 contains the new Bernstein inequality and concentration inequalities for exponentially growing graphs, it is the main part of this article. The Appendix A contains a useful result of Davydov (1968).
1 Introduction
In this section we consider a general graph with a countable set of nodes and a set of edges . We define the natural metric on as the minimal number of edges between two nodes
[TABLE]
The metric is extended to sets in the usual way: . We denote by the set of neighbors of w.r.t. for a node of a graph . Furthermore, we assume that there is a probability space which is endowed with a real-valued random field . The latter is indexed by the set of nodes , i.e., is a family of random variables such that is measurable for each . We denote the indicator function by and we define the -mixing coefficient of the random field on the graph by
[TABLE]
The random field is strong mixing w.r.t. if and only if for . In the sequel, we investigate random fields which are defined on the following class of graphs:
Definition 1.1** (Trees growing at an exponential rate ).**
Let . A tree is growing at an exponential rate if is a rooted tree and each node has exactly children. The nodes in the tree are labeled according to the following scheme: the distinguished root (which has no parent) is labeled by and the children of the node are labeled by . Hence, the set of nodes and the set of edges are given by
[TABLE]
A rooted graph is growing at an exponential rate if the edges can be decomposed into two disjoint sets as such that is a tree growing at an exponential rate and the set of additional edges has the property that it does not connect nodes of arbitrary length in , i.e.,
[TABLE]
We come to the definition of a mixing embedding of a graph. Here it is worthy to mention that especially in the context of graph theory there are different definitions of graph embeddings: the common definition of an embedding of a graph requires, loosely speaking, that the edges of the embedded graph may only intersect at their endpoints, i.e., at the nodes. It is well known that any graph with countably many nodes can be embedded into via placing the -th node at the point , compare Cohen et al. (1994). Furthermore, one can characterize the finite graphs which are embeddable into the plane (the planar graphs) with the help of the theorems of Kuratowski (1930) and of Wagner (1937). Here, we slightly change this graph theoretic definition such that it is tailored to our needs: we can omit the restriction that edges may not intersect at an interior point. However, since we shall usually be dealing with infinite graphs, we have to add a requirement that is essential when is comes to mixing random fields which are defined on the graph which is to be embedded. We need this definition to show what is intuitively clear: the Bernstein inequalities for regular lattices are not applicable in the context of graphs which grow at an exponential rate. We give the definition
Definition 1.2** (Mixing embedding of a graph).**
Let be a graph with countably many nodes and denote by the Euclidean -norm on the -dimensional lattice , for and . There is a mixing embedding of in if there is a dimension such that is isomorphic to a graph with and for each sequence with image it is true that
[TABLE]
In the following, when speaking of the lattice as a graph, we shall always understand the graph with nodes and edges where is the -th standard basis vector which is one in the -th coordinate and zero otherwise. Note that in this case, we have and . We have a practical lemma which gives equivalent formulations of this definition
Lemma 1.3**.**
Let be a graph. Then the following are equivalent
There is a mixing embedding of in 2. 2.
* is isomorphic to a graph with nodes and there is a constant such that for any with image it is true that .* 3. 3.
* is isomorphic to a graph with nodes and*
[TABLE]
In particular, let be a random field on , denote by the same random field under the graph isomorphism. Then the mixing coefficients satisfy asymptotically which means that strong mixing is inherited when switching between and .
Proof.
(1) (2) and (3): assume that there is a mixing embedding of in , then obviously is countable, thus, the number
[TABLE]
is meaningful and finite by assumption. Consequently, we have for two connected nodes and in that . If and are not connected then . Hence, is the proper constant. The converse inclusions (2) (resp. (3)) (1) are immediate.
We come to the amendment of the lemma. Let be given and consider a random field on and its graph-isomorphic counterpart on . We infer for two sets with preimage and that , i.e., we have using the graph isomorphism for
[TABLE]
Thus, for . This means that asymptotically or rather . ∎
The following class of graphs does not allow for a mixing embedding in
Proposition 1.4**.**
Let be a graph with root . Put and recursively
[TABLE]
for . If the map grows faster than any polynomial function of degree defined on , there is no mixing embedding of in .
Proof.
Let the map grow faster than any polynomial of degree and assume that there is a mixing embedding of in for some which satisfies as stated in Lemma 1.3. First, observe that for and both in the distance in the graph is at most . By assumption there is a such that for all we have . Thus, for there are with the property that which implies for these two nodes that
[TABLE]
Hence, . In the same way, there is a such that for all , we have . In particular, there are , with the property that which implies for
[TABLE]
which in turn implies . This contradicts the assumption that there is a mixing embedding of in . ∎
This implies that we cannot use the above mentioned Bernstein inequalities for data which is defined on a lattice to derive concentration inequalities for random fields that are defined on graphs which grow at an exponential rate . Instead we give a new Bernstein inequality which can deal with this class of random fields in the next section.
2 A Bernstein inequality for exponentially growing graphs
In this section we derive inequalities of the Bernstein type for random fields which are highly-connected and whose index set grows at an exponential rate. We need the following important lemma:
Lemma 2.1**.**
Let be a tree growing at an exponential rate . Denote by
[TABLE]
the set of nodes of the subtree of which has its root at the node and consists of generations. Consider the graph which is induced by the set of nodes . Then the number of pairs in this graph which are separated by exactly edges for is given by
[TABLE]
for a suitable constant which does not depend on , and .
Proof of Lemma 2.1.
The minimal distance in this subtree clearly is 1, whereas the maximal distance is . Let now a length be fixed, . We distinguish two cases for a pair which is separated by edges: in the first case (resp. ) is a descendant of (resp. ). In the second case and have a common parent which we call and, plainly, .
The first case is only possible for , for such an there are exactly such pairs in this subtree. The second case is possible for . Depending on the parent is located between generation zero and generation , denote its generation by . Having fixed a parent in generation the distance from to the first node is at least and at most , denote this distance by . Hence, there are exactly nodes in question for . In this case that the node is separated generations from . Since and their graph distance is , this yields possibilities for . All in all, we give the number of pairs with the formula from equation (2.2). ∎
It follows the Bernstein inequality. Here we do not consider the full set of nodes instead we focus on a strip of which is defined with the help of the from the previous Lemma 2.1.
Theorem 2.2** (Bernstein inequality).**
Let be a tree growing at an exponential rate . Let be a real-valued random variable for each with , and , for some . Let , and consider the subtree induced by the set of nodes
[TABLE]
with as in the definition given in (2.1). Then
[TABLE]
where such that and as well as
[TABLE]
Proof of Theorem 2.2.
We have to partition suitably. We use the abbreviations and as well as,
[TABLE]
for . Note that the and are the union of the disjoint sets and that some and might be empty. Furthermore, we define
[TABLE]
Then, we have with Markov’s inequality and the well-known AM-GM inequality that
[TABLE]
Hence, it suffices to consider the sum closer. We write
[TABLE]
We compute the expectations of the random variables , for sufficiently small. Note that the distance w.r.t. between and , , is at least . Since , we infer from Davydov’s inequality given in Proposition A.1 that
[TABLE]
for Hölder conjugate and . Furthermore, we have if that
[TABLE]
Now the random variables are essentially bounded by . Let and define . Then, we have
[TABLE]
Note that in the subgraph induced by the there are exactly pairs of nodes with , where is given in Lemma 2.1. For the next two lines we use the inequality for real numbers , , . Consequently, we get
[TABLE]
with Davydov’s inequality from Proposition A.1.
Furthermore, we find with the Hölder inequality that . Thus, equation (2.5) can be bounded by
[TABLE]
Especially, for the case successive iteration of (2.6) yields for the choice and (as in Valenzuela-Domínguez et al. (2017))
[TABLE]
Next, since and , we arrive at
[TABLE]
The computations for are similar and one achieves the same bounds for this term. This finishes the proof. ∎
We are now in position to derive a concentration inequality. We consider an infinite tree which grows at an exponential rate and which is endowed with a random field . We assume that the random field on the tree is strong mixing such that
[TABLE]
where is defined in Lemma 2.1. We say that the mixing coefficients decay at a super-exponential (or hyper-exponential) rate if there is a positive increasing function with such that
[TABLE]
In this case, equation (2.7) follows from Lemma 2.1 with the bound and the following concentration inequality is true
Theorem 2.3** (Concentration inequality for exponentially growing trees).**
Let be a tree growing at an exponential rate and let be a random field on as in Theorem 2.2. Let the random field be strong mixing w.r.t. the graph metric with -mixing coefficients which fulfill (2.7), e.g., the mixing coefficients decay at a super-exponential rate as in (2.8). Consider the subgraph which consists of the first generations of for
[TABLE]
Then there are constants such that for all and
[TABLE]
This means the probability decays asymptotically at a rate which is approximately linear in the size of the sample .
Proof of Theorem 2.3.
Let for some . We partition in the following way: first we define the wedge which consist of the first generations
[TABLE]
The remaining generations are collected in
[TABLE]
The sums which correspond to these partitioning are and . Then we split the probability as follows,
[TABLE]
The first probability in (2.9) is negligible because we find
[TABLE]
Thus, we can focus on the second probability in (2.9). We use Theorem 2.2. We make the following definitions
[TABLE]
Consider the exponent of the first factor given in (2.4): one finds that there is a constant which does neither depend on nor on nor on the such that
[TABLE]
The second factor in (2.4) is given by
[TABLE]
We can derive the following bound for the mixing coefficient and the exponent inside the -function in (2.11)
[TABLE]
Consider the second factor inside the -function in (2.11), it is . In particular, the second factor in (2.11) is uniformly bounded for all if is sufficiently large. Consider the third factor in (2.4). Since the mixing coefficients decay sufficiently fast, we can derive the following inequality
[TABLE]
for a suitable constant . In particular, this expression is uniformly bounded over all . All in all, we have shown that there are constants such that for the second probability in (2.9) is bounded as
[TABLE]
where, the asymptotic speed is determined by (2.10). This completes the proof. ∎
The previous theorem can be applied to exponentially growing graphs as well, we have the useful corollary:
Corollary 2.4** (Concentration inequality for exponentially growing graphs).**
Let be a graph growing at an exponential rate endowed with a random field as in Theorem 2.3. Then there are constants such that for all and
[TABLE]
Proof of Corollary 2.4.
We only need to show that the mixing conditions for the tree are fulfilled. The condition that implies that
[TABLE]
In particular, the mixing rates w.r.t. the tree and the whole graph structure satisfy asymptotically the inequality relations . Thus, we can conclude the statement from Theorem 2.3. ∎
Appendix A Appendix
Proposition A.1** (Davydov (1968)).**
Let be a probability space and let be sub--algebras. Denote by the -mixing coefficient between and . Let be Hölder conjugate. Let (resp. ) be in and -measurable (resp. in and -measurable). Then
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bernstein (1927) S. Bernstein. Sur l’extension du théorème limite du calcul des probabilités aux sommes de quantités dépendantes. Mathematische Annalen , 97(1):1–59, 1927.
- 2Bradley (2005) R. C. Bradley. Basic properties of strong mixing conditions. a survey and some open questions. Probability surveys , 2(2):107–144, 2005.
- 3Bryc and Dembo (1996) W. Bryc and A. Dembo. Large deviations and strong mixing. In Annales de l’IHP Probabilités et statistiques , volume 32, pages 549–569, 1996.
- 4Carbon (1983) M. Carbon. Inégalité de Bernstein pour les processus fortement mélangeants non nécessairement stationnaires. C.R. Acad. Sc. Paris I , 297:303–306, 1983.
- 5Cohen et al. (1994) R. F. Cohen, P. Eades, T. Lin, and F. Ruskey. Three-dimensional graph drawing. In International Symposium on Graph Drawing , pages 1–11. Springer, 1994.
- 6Collomb (1984) G. Collomb. Propriétés de convergence presque complète du prédicteur à noyau. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete , 66(3):441–460, 1984.
- 7Davydov (1968) Y. A. Davydov. Convergence of distributions generated by stationary stochastic processes. Theory of Probability & Its Applications , 13(4):691–696, 1968.
- 8Doukhan (1994) P. Doukhan. Mixing, volume 85 of Lecture Notes in Statistics . Springer-Verlag, New York, 1994.
