Testing Graphs against an Unknown Distribution
Lior Gishboliner, Asaf Shapira

TL;DR
This paper characterizes which graph properties remain testable when the vertex distribution is unknown and arbitrary, extending classical graph property testing to a more general and realistic setting.
Contribution
The paper provides a complete characterization of testable graph properties under unknown vertex distributions, including a new removal lemma for vertex-weighted graphs.
Findings
Characterization of testable properties under unknown distributions
A new removal lemma for vertex-weighted graphs
Extension of classical graph testing models
Abstract
The area of graph property testing seeks to understand the relation between the global properties of a graph and its local statistics. In the classical model, the local statistics of a graph is defined relative to a uniform distribution over the graph's vertex set. A graph property is said to be testable if the local statistics of a graph can allow one to distinguish between graphs satisfying and those that are far from satisfying it. Goldreich recently introduced a generalization of this model in which one endows the vertex set of the input graph with an arbitrary and unknown distribution, and asked which of the properties that can be tested in the classical model can also be tested in this more general setting. We completely resolve this problem by giving a (surprisingly "clean") characterization of these properties. To this end, we prove a removal lemma…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Testing Graphs against an Unknown Distribution111A preliminary version of this paper has appeared in the Proceedings of STOC ’19.
Lior Gishboliner School of Mathematics, Tel Aviv University, Tel Aviv 69978, Israel. Email: [email protected]. Supported in part by ERC Starting Grant 633509.
Asaf Shapira
School of Mathematics, Tel Aviv University, Tel Aviv 69978, Israel. Email: asaficotau.ac.il. Supported in part by ISF Grant 1028/16 and ERC Starting Grant 633509.
Abstract
The area of graph property testing seeks to understand the relation between the global properties of a graph and its local statistics. In the classical model, the local statistics of a graph is defined relative to a uniform distribution over the graph’s vertex set. A graph property is said to be testable if the local statistics of a graph can allow one to distinguish between graphs satisfying and those that are far from satisfying it.
Goldreich recently introduced a generalization of this model in which one endows the vertex set of the input graph with an arbitrary and unknown distribution, and asked which of the properties that can be tested in the classical model can also be tested in this more general setting. We completely resolve this problem by giving a (surprisingly “clean”) characterization of these properties. To this end, we prove a removal lemma for vertex weighted graphs which is of independent interest.
1 Introduction
1.1 Background and the main result
Property testers are fast randomized algorithms whose goal is to distinguish (with high probability, say, ) between objects satisfying some fixed property and those that are -far from satisfying it. Here, -far means that an -fraction of the input object should be modified in order to obtain an object satisfying . The study of such problems originated in the seminal papers of Rubinfeld and Sudan [28], Blum, Luby and Rubinfeld [9], and Goldreich, Goldwasser and Ron [20]. Problems of this nature have been studied in so many areas that it will be impossible to survey them here. Instead, the reader is referred to the recent monograph [18] for more background and references. While this area studies questions in theoretical computer science, it has several strong connections with central problems in extremal combinatorics, most notably to the regularity method and the removal lemma, see Subsection 1.2.
The classical property testing model assumes that one can uniformly sample entries of the input. In distribution-free testing, one assumes that the input is endowed with some arbitrary and unknown distribution , which also affects the way one defines the distance to satisfying a property. As discussed in [19], one motivation for this model is that it can handle settings in which one cannot produce uniformly distributed entries from the input. Another motivation is that the distribution can assign higher weight/importance to parts of the input which we want to have higher impact on the distance to satisfying the given property. Until very recently, problems of this type were studied almost exclusively in the setting of testing properties of functions, see [10, 11, 15, 17, 24]. Let us mention that distribution-free testing is similar in spirit to the celebrated PAC learning model of Valiant [31], see also the discussion in [27].
Our investigation here concerns a distribution-free variant of the adjacency matrix model, also known as the dense graph model. The adjacency matrix model was first defined and studied in [20], where the area of property testing was first introduced. This model has been extensively studied in the past two decades, see Chapter of [18]. For a selected (but certainly not comprehensive) list of works on the dense graph model of property testing, see [2, 21, 23].
Instead of defining the adjacency matrix model of [20], let us directly define its distribution-free variant which was introduced recently by Goldreich [19]. Since the distribution in this model is over the input’s vertices, it is called the Vertex-Distribution-Free (VDF) model222Goldreich suggested to study variants of this model in other settings (such as bounded degree graphs [22]) as well. For brevity, we will use the term “VDF model” to refer to the “VDF variant of the adjacency matrix model”.. The input to the algorithm is a graph and some arbitrary and unknown distribution on . We will thus usually refer to the input as the pair . For a pair of graphs on the same vertex-set , and for a distribution on , the (edit) distance between and with respect to is defined as . We say that is -far from satisfying a graph property333A graph property is simply a family of graphs closed under isomorphism. if for every , the distance between and with respect to is at least . A tester for a graph property is an algorithm that receives as input a pair and a proximity parameter , and distinguishes with high probability (say ) between the case that satisfies and the case that is -far from . The algorithm has access to a device that produces random vertices from distributed according to . The only444Note that the algorithm does not receive as part of the input. other way the algorithm can access is by performing “edge queries” of the form “is an edge of ?”. We say that property is testable in the VDF model if there is a function and a tester for that always performs a total number of at most vertex samples and edge queries to the input. We stress again that is unknown to the tester, so (in particular) that should be independent of . The function is sometimes referred to as the sample (or query) complexity of the tester. A tester has 1-sided error if it always accepts an input satisfying . Otherwise it has 2-sided error.
Suppose we assume that in the VDF model, the distribution is restricted to be the uniform distribution; in particular, the distance between -vertex graphs (on the same vertex-set) is , and is -far from if one needs to change at least edges to turn into a graph satisfying . In this paper we will refer to this model as the standard model. This model is “basically” equivalent to the adjacency matrix model, which was introduced in [20]. We refer the reader to [19] for a discussion on the subtle differences between the adjacency matrix model and the above defined standard model555Just as an example, in [20] the tester “knows” while in the VDF model (and thus also in the standard model) it does not..
A very elegant result proved in [19], states that if is testable in the VDF model then it is testable in the standard model with one-sided error. A natural follow-up question, raised by Goldreich in [19], asks whether every property which is testable with one-sided error in the standard model, is also testable in the VDF model. A characterization of the properties testable with one-sided error in the standard model was given in [5], where it was shown that these are precisely the semi-hereditary properties (see [5] for the definition of this term). We show (see Proposition 4.2), that if is testable in the VDF model then is hereditary666A graph property is hereditary if it is closed under removal of vertices.. Since there are properties which are semi-hereditary but not hereditary, this implies a negative answer to Goldreich’s question. Thus, it is natural to ask the following revised version of Goldreich’s question:
Problem 1.1**.**
Are all hereditary graph properties testable in the VDF model?
It might be natural to guess777This was at least our initial guess. that every hereditary property is testable in the VDF model, the justification being that all lemmas that were used in [5] should also hold for weighted graphs. As it turns out, this is indeed the case. However, putting all these lemmas together does not seem to work in the VDF model. As our main result, Theorem 1 below, shows, it is no coincidence that the proof technique of [5] does not carry over as is to the weighted setting.
We start with an important definition. Let us say that a graph property is extendable if for every graph satisfying there is a graph on vertices which satisfies and contains as an induced subgraph. In other words, is extendable if whenever is a graph satisfying and is a “new” vertex (i.e. ), one can connect to in such a way that this larger graph will also satisfy . Note that if is extendable then in fact for every graph and for every , there is an -vertex graph satisfying which contains as an induced subgraph. Our main result in this paper is the following:
Theorem 1**.**
A graph property is testable in the VDF model if and only if it is hereditary and extendable.
It is interesting to compare the above (rather) simple characterization of the properties that are testable in the VDF model, with the (very) complicated characterization of [2] of the properties that are testable in the standard model.
Let us mention some immediate consequences of Theorem 1. Since a graph cannot contain both an isolated vertex and a vertex connected to all other vertices, we infer that for every fixed the (hereditary) property of being induced -free is extendable. We thus infer that:
Corollary 2**.**
The property of being induced -free is testable in the VDF model for every fixed .
It is also clear that the property of being -free is extendable if and only if has no isolated vertices. We thus infer that:
Corollary 3**.**
The property of being -free is testable in the VDF model if and only if has no isolated vertices.
It is easy to see that most (natural) hereditary graph properties are extendable, so Theorem 1 immediately implies that they are all testable in the VDF model. These include the properties of being Perfect, Interval, Chordal and -Colorable. In the other direction, Theorem 1 implies that if has an isolated vertex then -freeness is not testable in the VDF model. If one is interested in a more “natural” non-extendable hereditary property, then it is not hard to see that another such example is the property of being induced -free, where (resp. ) is the graph obtained from the -edge path by adding a new vertex which is adjacent to all vertices of (resp. not adjacent to any vertex of ). It is easy to see that satisfies but is not extendable. It was proved in [19] that the properties of being Hamiltonian, Eulerian and Connected are not testable in the VDF model. Those three results follow immediately from our Theorem 1 since these properties are not hereditary.
1.2 The combinatorial interpretation of Theorem 1
Let us discuss the combinatorial implications of Theorem 1 and its relation to other results in the area of extremal combinatorics. The famous triangle removal lemma of Ruzsa and Szemerédi [29] states that if a graph is -far from being triangle free (with respect to the uniform distribution), then a (uniform) sample of vertices from contains a triangle with probability at least . We refer the reader to [13] for more background on this lemma and its variants. The result of [5] mentioned above, can be thought of as a generalization of this lemma to arbitrary hereditary properties. It can be stated as saying that for every hereditary graph property there is a function such that the following holds for every . If a graph is -far from (with respect to the uniform distribution) then a (uniform) sample of vertices from induces a graph not satisfying with probability at least .
To prove (the “if” direction of) Theorem 1, we will actually prove the following combinatorial statement, which can be thought of as a vertex-weighted version of the graph removal lemma.
Theorem 4**.**
For every hereditary and extendable graph property there is a function such that the following holds for every and for every vertex-weighted graph which is -far from . Let , , be a sequence of random vertices of , sampled according to and independently. Then does not satisfy with probability at least .
The following similar-looking result888We note that the results of [7] and [26] are more general. The authors of [26] actually prove that the conclusion of Theorem 5 holds for all graphons. The authors of [7] prove extensions of Theorem 5 in several directions, including a version for uniform hypergraphs, and a strengthening in which the notion of testability is replaced with the stronger notion of repairability. was (implicitly) proved by Austin and Tao [7] and Lovász and Szegedy [26].
Theorem 5** ([7, 26]).**
For every hereditary graph property there is a function such that the following holds for every and for every vertex-weighted graph which is -far from . Let , , be a sequence of random vertices of , sampled according to and independently. Construct a graph on by letting if and only if . Then does not satisfy with probability at least .
Note that Theorem 5 holds for all hereditary properties, while Theorem 4 only holds for hereditary properties which are extendable. Observe that the graph in Theorem 5 is a blowup of the graph , where . Thus, the difference between Theorems 4 and 5 is that Theorem 5 only guarantees that a blowup of does not satisfy w.h.p., while Theorem 4 guarantees the stronger assertion that itself does not satisfy w.h.p. This is an important difference: while Theorem 4 immediately implies the existence of a VDF-tester for every hereditary and extendable property (see Subsection 3.3), we do not know of any way of using Theorem 5 to prove the existence of such a tester. One natural candidate for a tester derived from Theorem 5 would be the algorithm which accepts if and only if the graph (defined in Theorem 5) does not satisfy . It turns out, however, that this algorithm often fails to be a valid tester999For example, if -freeness then this tester will reject w.h.p if the input graph is a triangle with uniform vertex distribution (as the graph will typically contain the 2-blowup of a triangle, and thus contain a copy of ), even though this input graph clearly satisfies ..
It is worth noting that Theorem 5 can be deduced from the “unweighted” case, i.e. the result of [5], via a simple argument, see Lemma 5.5 and the discussion following it. On the other hand, the proof of Theorem 4 requires several new ideas on top of those used in [5].
1.3 Variants of the VDF model
The proof of the “only if” part of Theorem 1, showing that if is either non-extendable or non-hereditary then is not testable in the VDF model, relies on allowing the input graph to have only vertices (where the constant is independent of ); on excluding from the input fed to the tester; and on having distributions that assign to some vertices weight and to some vertices weight . This raises the natural question of what happens if we only require the tester to work on sufficiently large graphs; or if the tester receives as part of the input; or if we forbid from assigning very low or very high weights (as above). As the following four theorems show, either one of these variations has a dramatic effect on the model, since it then allows all hereditary properties to be testable.
We start with the setting in which the input graph is guaranteed to be large enough. In a revised version of [19], Goldreich asked whether every hereditary property is testable (in the VDF model) on graphs of order at least , for which is independent of . As we show in Proposition 5.2, this turns out to be false. On the positive side, we show that under the stronger assumption that the input size is at least (where is a function dependent on ), all hereditary properties are testable.
Theorem 6**.**
Under the promise that , every hereditary property is testable with one-sided error in the VDF model.
D. Ron (personal communication) asked what happens if we allow testers to receive (i.e., the number of vertices in the input graph) as part of the input101010We note that in the VDF model as defined in [19], the number of vertices in the input graph is not known to the tester. Our following theorem answers this question.
Theorem 7**.**
If testers can receive as part of the input, then every hereditary property is testable with one-sided error in the VDF model.
Finally, we consider settings in which restrictions are posed on the weights that the distribution can assign.
Theorem 8**.**
Under the promise that , every hereditary property is testable with one-sided error in the VDF model.
Theorem 9**.**
Under the promise that , every hereditary property is testable with one-sided error in the VDF model.
We note that the implied constant in the -notation in Theorem 9 is allowed to depend on . We refer the reader to Section 5 for the precise statements of Theorems 6–9. Let us mention that the proofs of Theorems 6, 7 and 9 rely on reductions to our main result in this paper, Theorem 1. The proof of Theorem 8 proceeds by a reduction to the standard model (i.e. to the result of [5]). As part of this proof, we solve another problem raised in [19].
1.4 Paper overview
The rest of the paper is organized as follows. Section 2 is devoted to proving vertex-weighted analogues of several lemmas that were used in prior works (most notably regularity and counting lemmas, and corollaries thereof). Some more routine parts of these proofs are deferred to the appendix. In Section 3 we prove the “if” direction of Theorem 1 (i.e. Theorem 4). This is by far the most challenging (and interesting) part of this paper. The main step towards proving Theorem 1 is establishing Lemma 3.1, which is the key lemma of this paper. For the reader’s convenience, we give in Subsection 3.1 an overview of the key ideas of the proof. As the proofs in Section 2 are somewhat routine, we encourage readers who are familiar with the regularity method to skip Section 2 (at least on their first read), and go directly to Section 3.
The “only if” direction of Theorem 1 is proved in Section 4. In Section 5 we prove Theorems 6, 7, 8 and 9. We also raise two additional problems related to the VDF model; one is to what extent can one extend the results of Theorems 6-9 beyond hereditary properties, and the other asks if the sample complexity in the VDF model is the same as in the standard model (for properties that are testable in the VDF model), see Subsection 5.3. Along the way we resolve another open problem raised in [19] (see Lemma 5.5). Throughout the paper, when we say that a function is increasing/decreasing we mean weakly increasing/decreasing (i.e. non-decreasing/non-increasing).
2 Preliminary Lemmas
In this section we introduce vertex-weighted analogues of some key tools of the regularity method, most notable Szemerédi’s regularity lemma [30], the strong regularity lemma [1], and the counting lemma, as well as some standard corollaries thereof. We also prove some other auxiliary lemmas needed for the proof of Theorem 1.
We start with two simple lemmas regarding probability distributions111111Throughout the paper, we will simply write “distribution” to mean “probability distribution”. on a finite set. Given a distribution on a set and a subset , we use the notation , and call the weight of . We denote by the distribution conditioned on , namely for every .
Lemma 2.1**.**
For every set , for every and for every distribution on , there is a partition of into parts such that .
[Proof]Let be a random partition of into parts, where each element is assigned to one of the parts uniformly at random and independently of all other elements. Then for every pair of distinct elements , the probability that and belong to the same part is exactly . By linearity of expectation we have
[TABLE]
**so there is a choice of with the required property. **
Lemma 2.2**.**
Let be an integer, let be a finite set and let be a distribution on such that for every . Then there is a partition such that for every .
[Proof]We proof is by induction on . The base case is trivial, so we assume from now on that . Let be a set of minimal size satisfying . Then , because otherwise we could remove an arbitrary element of (whose weight by assumption is at most ) and thus get a proper subset of having weight at least , in contradiction the minimality of . Now set , noting that . Then every satisfies
[TABLE]
So by the induction hypothesis for , there is a partition such that
[TABLE]
**for every . This completes the proof. **
We consider vertex-weighted graphs, i.e. pairs such that is a graph and is a distribution on . For a set , the subgraph of induced by is defined to be , where is the distribution conditioned on . The weight of an edge/non-edge (with respect to ) is defined as . For a pair of disjoint sets with , the density of is denoted by and defined to be , where is the set of edges with one endpoint in and one endpoint in . If or then define . A pair of disjoint vertex-sets is called -regular if for every and with and , it holds that . The following lemma describes some basic properties of -regular pairs.
Lemma 2.3**.**
Let be a vertex-weighted graph, and let be disjoint vertex-sets such that , and such that the pair is -regular with density . Then the following holds.
For every and , with and , the pair has density at least and at most , and is -regular with . 2. 2.
The set of vertices which satisfy has weight less than .
[Proof]Starting with Item 1, let and be such that and . Since , the -regularity of implies that . Now let us show that is -regular with . Let and be such that and . Then and similarly . So by the -regularity of we have and hence , as required.
We now prove Item 2. Let (resp. ) be the set of all satisfying (resp. ). We have
[TABLE]
**So unless , we get a contradiction to the -regularity of . Similarly, we must have . The assertion follows. ** The following is a vertex-weighted counting lemma.
Lemma 2.4** (Counting lemma for vertex-weighted graphs).**
For every integer and there is such that the following holds. Let be a graph on and let be pairwise-disjoint vertex-sets in a vertex-weighted graph , such that the following holds.
For every , if then , and if then . 2. 2.
For every , the pair is -regular.
Let be the set of all such that induce a copy of in which plays the role of for every . Then .
[Proof]If for some then there is nothing to prove, so suppose that for every . The proof is by induction on . The base case trivially holds with . So from now on we assume that , and set
[TABLE]
For each , let be the set of all vertices for which . By Item 2 of Lemma 2.3, we have . Hence, the set satisfies , where in the last inequality we used our choice of . Now fix any . We define sets as follows: for , if then set , and if then set . By using Item 1 and the fact that , we get that for every . By Item 1 of Lemma 2.3, and by Conditions 1-2 of the current lemma, we get that for every , the pair is -regular with , and that if then and if then .
We now see that the sets satisfy the requirements of the lemma with respect to the graph and with in place of . Let be the set of all such that induce a copy of with playing the role of for every . By the induction hypothesis, we have
[TABLE]
For every , the tuple induces a copy of with playing the role of for every . Hence, for every we have (where is defined in the statement of the lemma). Since this is true for every , we get that
[TABLE]
**as required. **
A partition of the vertex-set of a vertex-weighted graph is called -regular if the sum of over all pairs for which is not -regular, is at most . We now state vertex-weighted versions121212We note that a weighted version of Szemerédi’s regularity lemma, where both vertex-weights and edge-weights are allowed, was proved in [14], but only under the assumption that all vertex-weights are . Hence this lemma is unsuitable in our setting. of Szemerédi’s regularity lemma [30] and of the strong regularity lemma [1]. The proofs of these lemmas appear in the appendix.
Lemma 2.5** (Szemerédi’s regularity lemma for vertex-weighted graphs).**
For every and there is such that for every vertex-weighted graph and for every partition of of size not larger than , there is an -regular partition of which has at most parts and refines .
Lemma 2.6** (Strong regularity lemma for vertex-weighted graphs).**
For every function and for every integer , there is such that for every vertex-weighted graph and for every partition of of size at most , there is a refinement of , and a refinement of , such that the following holds.
. 2. 2.
The partition is -regular. 3. 3.
. Here the outer sum is over all unordered pairs of distinct , and the inner sum is over all such that for .
Our last two lemmas are vertex-weighted analogues of well-known corollaries to Szemerédi’s regularity lemma and the strong regularity lemma, respectively. The “unweighted” versions of these corollaries were used in [5] in order to prove that every hereditary property is testable in the standard model.
Lemma 2.7**.**
For every integer and for every there is , such that the following holds. Let be a vertex-weighted graph such that every vertex in has weight less than . Then there are pairwise-disjoint vertex-sets with the following properties.
* for every .* 2. 2.
* is -regular for every .* 3. 3.
Either all pairs have density at least , or all pairs have density less than .
[Proof]Setting and , we will prove the lemma with
[TABLE]
Let satisfying for every . Apply Lemma 2.2 with , with the distribution , and with as defined above. Lemma 2.2 supplies a partition such that for every . Now apply Lemma 2.5 to with parameter and with the partition , to obtain an -regular partition which refines . For each , put , and sample with probability proportional to the weight of the parts, i.e. with probability for every . We claim that with positive probability, for every , and all pairs are -regular. For every , the probability that is less than , where in the first inequality we used the guarantees of Lemma 2.5. By the union bound, with probability at least we have for every . Next, observe that since is -regular and as , the probability that is not -regular (for some specific ) is at most . So by taking the union bound over all pairs , we get that with probability at least , all pairs are -regular. This proves our assertion.
**We thus showed that there is a choice of such that for every and such that is -regular for every . Now consider an auxiliary graph on in which is an edge if and is a non-edge if . As , a well-known bound on Ramsey numbers implies that this graph contains either a clique or an independent set . Then satisfy the requirements of the lemma. **
Lemma 2.8**.**
For every function and for every integer , there is such that for every vertex-weighted graph and for every partition of having size at most , there is a partition of and vertex-sets for , such that the following holds:
. 2. 2.
For every , is contained in some part of . 3. 3.
* for every . In particular, .* 4. 4.
For every , the pair is -regular. 5. 5.
.
[Proof]We may and will assume is monotone decreasing131313Indeed, we can replace with , which is clearly monotone decreasing.. For convenience, put . Let be the function . We will show that one can choose , where . Apply Lemma 2.6 to with parameter and with the given partition , to obtain partitions and such that refines , refines , and Items 1-3 in Lemma 2.6 hold. Let be the union of all parts of of weight less than , and let be the parts of of weight at least . Then we have , establishing Item 1. Now set . It is evident that Item 2 holds.
For each , denote , and sample with probability proportional to the weight of the parts; in other words, for each , the probability that is . We will show that with positive probability, satisfy Items 3-5. For each , the probability that is less than . By the union bound, the probability that there is for which is less than . So with probability larger than , for every we have
[TABLE]
where the last inequality is due to our choice of via Lemma 2.6.
We now prove that Item 4 holds with probability greater than . Fix any . Since is -regular with , and since (by the monotonicity of ), the probability that the pair is not -regular is at most , where the first inequality holds because . By the union bound over all pairs , the probability that there is for which is not -regular is at most .
It remains to show that Item 5 holds with probability at least . Observe that
[TABLE]
**where in the inequality we used Item 3 of Lemma 2.6, our choice of , and the fact that . So by Markov’s inequality, the probability that Item 5 fails is at most , as required. **
3 The Main Proof
In this section we prove the “if” direction of Theorem 1. In Subsection 3.1 we give a high-level overview of the main obstacle one needs to overcome in proving Theorem 1, and the main idea behind the way we overcome it. In Subsection 3.2 we state and prove Lemma 3.1, which constitutes the main ingredient in the proof of Theorem 1. Finally, we prove (the “if” direction of) Theorem 1 in Subsection 3.3.
3.1 Proof overview
The main difficulty:
Suppose is an extendable hereditary graph property. We are given a graph and a distribution so that is -far from with respect to . Our goal is to show that a sample of vertices141414Throughout this subsection, and mean positive quantities that depend only on and not on or . from finds with high probability (whp) an induced subgraph of which does not satisfy . There are two ways one can try to tackle this problem. First, one can take a blowup of , in which a vertex is replaced by a cluster of vertices whose size is proportional to the vertex’s weight under , and thus (try to) “reduce” the problem to the non-weighted case. While this approach can allow one to handle some properties151515Indeed, this is the approach used in [19]., it seems that the main bottleneck is that a copy of in does not correspond necessarily to a copy of in , since might contain several of the vertices that replaced a vertex of . Moreover, if this vertex has weight then even a sample of size will very likely contain several of the vertices of that replaced .
A second approach would be to just reprove the result of [5], while replacing the regularity lemmas used there with regularity lemmas for vertex-weighted graphs. While such lemmas are indeed not hard to prove (see e.g. Lemmas 2.4-2.8), the main problem is again vertices of high weight. Now the issue is that clusters of the regular partition might contain only a single vertex of high weight, a situation in which one would not be able to embed graphs that need to use more than one vertex from the same cluster.
The key new idea:
The main idea is then to prove a lemma that allows one to partition into three sets with the following properties: will have total weight at most , all vertices in will have weight at least , will have a highly regular Szemerédi partition, that is, there will be a partition of the vertices of into sets so that the bipartite graphs between all pairs are pseudo-random (or regular in the sense of the regularity lemma), each of the clusters will have “enough” vertices, and for each and set , either will be connected to all vertices of or to none of them. We will now see how a partition with properties – can allow one to test . Let us note that the actual structure we will use is much more complicated than is described in the above five properties (cf. Lemma 3.1), and that in the present discussion we intentionally oversimplify some technical aspects in order to highlight our main new idea. For example, we will not actually be able to guarantee that all pairs are pseudo-random (or that the measure of pseudo-randomness of these pairs is sufficient for our purposes); instead, as is common in this type of proofs, we will have “representative sets” such that all pairs are pseudo-random and most have roughly the same density as .
We first claim that (i.e. the graph induced by ) is -far from satisfying . Indeed, if this is not the case, then we can first turn the graph induced by these sets into a graph satisfying by making changes of total weight less than , and then use the fact that is extendable and the fact that the total weight of is at most in order to reconnect the vertices of to (and amongst themselves) so that the resulting graph will be in . The total weight of edges we thus change is less than , a contradiction.
We now examine the partition of and perform a “cleaning” procedure analogous to the one performed in applications of the regularity lemma. By this we mean that we make (only!) within changes of total weight less than so that if after these changes the set contains an induced copy of some (bounded-size) graph , then in the original graph, a sample of vertices from finds one such copy with high probability (whp). Here we will also rely on property of the partition. The fact that is -far from satisfying and that we made changes of total weight less than when cleaning , means that (after the cleaning) indeed has an induced copy of a graph that does not satisfy . We now claim that a sample of size from (before the cleaning) finds a copy of whp. First, since the total weight of is small, then sampling from is (effectively) like sampling from . Let now (resp. ) be the subgraph of induced by (resp. ). By the above discussion, a sample of size finds a copy of whp. Now, and this is the first crucial point, property mentioned above guarantees that the vertices of which form the copy of , form a copy of with every set of vertices in which forms a copy of . Now, and this is the second crucial point, property above guarantees that a sample of vertices finds the161616By “the” we mean that might contain only a single copy of , but this copy has to be of weight . This is in sharp contrast to the situation within , where each copy of might have very small weight, but the total weight of such copies must be . copy of contained in whp. Altogether, the algorithm finds an induced copy of using queries.
The new regularity lemma:
As it turns out, one cannot hope to partition as described in the first paragraph above, and instead we will have to define a partition with a much more complicated set of features. This is stated in Lemma 3.1 in the next subsection. One of the main difficulties is making sure that parts of the partition of will not contain only few (or even a single) vertices of high weight (i.e. we want to guarantee property stated above). This is done by making sure that the weight of the vertices in is very small compared to the weight of the parts . This in itself is challenging, because at the same time we need to have many parts in order to satisfy property above. The proof of Lemma 3.1 will use some of the lemmas of Section 2, most notably Lemma 2.8, which we will need to iterate (at least implicitly) in order to find the sought-after partition in the statement of Lemma 3.1.
3.2 The Key Lemma
In this subsection we state and prove Lemma 3.1, which is the main ingredient in the proof of the “if” direction of Theorem 1.
Lemma 3.1**.**
For every function and there is such that for every vertex-weighted graph there is a partition , a partition of , vertex-sets , and pairwise-disjoint vertex-sets , where , such that the following holds:
. 2. 2.
Every vertex in has weight at least . 3. 3.
For every and for every , either is adjacent to all vertices of , or to none of the vertices of . 4. 4.
. 5. 5.
. 6. 6.
For every , all pairs are -regular, and either all pairs have density at least , or all pairs have density less than . 7. 7.
For every and , the pair is -regular and . 8. 8.
For every and , .
Note that Items 2 and 8 in Lemma 3.1 together imply that . The following lemma constitutes the main part of the proof of Lemma 3.1. After proving Lemma 3.2, we deduce Lemma 3.1 from Lemmas 3.2 and 2.7.
Lemma 3.2**.**
For every function and there is such that for every vertex-weighted graph there is a partition , a partition of and vertex-sets (for ) such that Items 1-5 in Lemma 3.1 hold (with respect to ), and such that the following two conditions are satisfied.
- (a)
*For every , the pair is -regular. * 2. (b)
For every the following holds: , and all vertices in have weight less than .
[Proof]We may and will assume that the function is monotone increasing171717To guarantee that is monotone increasing, we can simply replace with the function ., and that the function , whose existence is guaranteed by Lemma 2.8, is monotone decreasing in and monotone increasing in . Here, being monotone decreasing in means that if a pair of functions satisfy for every , then for every . For each , define the function by
[TABLE]
Now define the functions by setting:
[TABLE]
Note that for every , and that and are monotone increasing. We define a monotone increasing sequence as follows: , and for each , . We will show that the lemma holds with
[TABLE]
Let be a vertex-weighted graph. We iteratively define a sequence of pairwise-disjoint vertex-sets as follows: let be the set of all vertices of of weight at least ; for each , let be the set of all vertices in having weight at least . Since are pairwise-disjoint, there must be for which . We now set , and . Note that . Setting , note that every vertex in has weight at least (so in particular ), while every vertex in has weight less than .
If then , so the assertion of the lemma holds for and , and we are done. So we may and will assume from now on that . Let be a partition of into parts such that , as guaranteed by Lemma 2.1. For every , consider the partition of . Let be the common refinement of the partitions and . Then for every and , either is adjacent to every vertex of , or is not adjacent to any vertex of . Moreover, we have .
Now apply Lemma 2.8 to with parameters and , and with the partition (noting that ), to obtain a partition of and vertex-sets (for ), with the properties stated in that lemma. Note that in particular we have
[TABLE]
Set and , noting that , and hence , as required by Item 1 in Lemma 3.1. Items 3 and 4 in Lemma 3.1 hold because each of the sets is contained in some part of , and hence also in some part of . Item 2 of Lemma 3.1 was already verified above, and Item 5 of Lemma 3.1 is guaranteed by Lemma 2.8. Item (a) holds because Lemma 2.8 guarantees that all pairs are -regular, and because (here we used our choice of , the fact that , and the monotonicity of ). It remains to prove Item (b). For each , we have
[TABLE]
where in the second inequality we used the guarantees of Lemma 2.8, and later we used our choice of and , the monotonicity of , and the fact that . Next, fix and recall that all vertices in have weight less than
[TABLE]
**where in the first inequality we used our choice of , in the last two inequalities we used the monotonicity of , and in the second inequality we also used (1) and an intermediate step in (2). This shows that for every and , as required. **
[Proof of Lemma 3.1] Define the functions
[TABLE]
and
[TABLE]
We may and will assume that the function is monotone decreasing in and monotone increasing in . This assumption implies that the function defined above is monotone decreasing. We prove the lemma with
[TABLE]
Let be a vertex-weighted graph. Apply Lemma 3.2 to with parameters and , to obtain a partition , a partition of , and subsets (for ) such that Items 1-5 of Lemma 3.1 hold (with respect to ), and so do Items (a) and (b) of Lemma 3.2.
Let us now prove that Items 6-8 (in Lemma 3.1) hold. It will be convenient to put . By Item (b) in Lemma 3.2 and by our choice of , we have
[TABLE]
for every and . Recalling our choice of , we see that Lemma 2.7 is applicable to with parameters and . Applying Lemma 2.7 with this input, we obtain pairwise-disjoint vertex-sets satisfying the properties stated in that lemma. The guarantees of Lemma 2.7 immediately establish Item 6, and also imply that for every we have
[TABLE]
**where in the second and third inequalities we used the fact that , as guaranteed by Item 2 of Lemma 3.1 and Item (b) of Lemma 3.2; in the third inequality we also used the monotonicity of . This establishes Item 8. It remains to prove Item 7. By Item (a) of Lemma 3.2, the pair is -regular for every . Fix any . Recalling that and that , we apply Item 1 of Lemma 2.3 to with parameter , to conclude that , and that the pair is -regular, as required. **
3.3 Proof of the Main Result
In this subsection we prove (the “if” direction of) Theorem 1. For a hereditary and extendable graph property , our tester for will work as follows: given an input and a proximity parameter , the tester samples a sequence of vertices independently and with distribution , where is as in Theorem 4; the tester then accepts if and only if satisfies . Since is hereditary, this tester accepts with probability if the input graph satisfies . In the other direction, Theorem 4 immediately implies that if the input is -far from then the tester rejects with probability at least . So we see that the “if” direction of Theorem 1 follows from Theorem 4.
From now on our goal is to prove Theorem 4. We start by introducing variants of some definitions from [5]. An embedding scheme is a complete graph with a vertex partition , such that every vertex in is colored black or white, every edge with an endpoint in is colored black or white, and every edge contained in is colored black, white or grey. Note that one of may be empty; that the vertices of are not colored; and that the edges with at least one endpoint in cannot be colored grey. An embedding from a graph to an embedding scheme is a map such that the following holds:
For every we have . 2. 2.
For every , if is colored black then induces a complete graph, and if is colored white then induces an empty graph. 3. 3.
For every , if is colored black then the bipartite graph between and is complete, and if is colored white then the bipartite graph between and is empty (note that there are no restrictions in the case that is colored grey).
Note that Condition 3 implies that for every and , the bipartite graph between and is either complete or empty. We use the notation to mean that there is an embedding from to . For a graph-family and an integer , let be the family of all embedding schemes on at most vertices, such that there is an embedding from some to . We now introduce a variant of the function defined in [5].
Definition 3.3**.**
For a graph-family and an integer for which , define
[TABLE]
If then define .
We are now ready to prove Theorem 4 (and thus also the “if” direction of Theorem 1).
[Proof of Theorem 4] Let be a hereditary and extendable graph property. Let be the family of graphs which do not satisfy . Fix , and let be the function
[TABLE]
where is defined in Definition 3.3. We may and will assume that the function is monotone decreasing in and monotone increasing in . Set . We prove the theorem with
[TABLE]
Let be a vertex-weighted graph which is -far from . Apply Lemma 3.1 to with parameter and with as above, to obtain a partition , a partition of , subsets (for ), and pairwise-disjoint subsets , such that and Items 1-8 in Lemma 3.1 hold.
We claim that is -far from any graph on which satisfies . So suppose by contradiction that there is a graph on such that satisfies and such that is -close to . Since is extendable, there is a graph on such that and such that satisfies . In order to turn into , we only need to add/delete edges which are incident to vertices of . Therefore, the total weight of edge-changes needed to turn into is at most , as guaranteed by Item 1 of Lemma 3.1. So we see that can be turned into , which satisfies , by adding/deleting edges whose total weight is less than , in contradiction the assumption that is -far from .
We thus proved that is -far from any graph satisfying . Now, let be the graph obtained from by doing the following changes:
For every , if for every then turn into a clique, and if for every , then turn into an independent set. By Item 6 in Lemma 3.1, one of these options has to hold. The total weight of edge-changes needed in this item is at most by Item 4 of Lemma 3.1. 2. 2.
For every , if then add all edges between and , and if then remove all edges between and (note that if then no changes are made in the bipartite graph between and ). The total weight of edge-changes needed in this item is less than by Item 5 of Lemma 3.1. Indeed, observe that the total weight of changes between is less than by the triangle inequality. Hence, the total weight of changes is less than
[TABLE]
Note that no edge with an endpoint in was added/deleted in Items 1-2, so and agree on all edges that are incident to vertices of .
We see that the total weight of edge-changes made in Items 1-2 is less than . So cannot satisfy , implying that . Note that by definition (see Items 1-2 above), the graph has the following properties:
- (a)
For every , is either a clique or an independent set in . Moreover, is a clique in then for every , and if is an independent set in then for every . 2. (b)
For every pair , if there is an edge in between and then . Then by Item 7 of Lemma 3.1 we have that for every . Analogously, if there is a non-edge in between and then , which implies (by Item 7 of Lemma 3.1) that for every .
Now let be the following embedding scheme: and ; for each , vertex is colored black if is a clique in and white if is an independent set in ; for each , edge is colored black if and white if ; for each , , edge is colored black if the bipartite graph between and is complete and white if this bipartite graph is empty (Item 3 in Lemma 3.1 implies that one of these options must hold); finally, for every , edge is colored black if the bipartite graph between and is complete in , white if the bipartite graph between and is empty in , and grey otherwise.
Observe that the map which maps to itself (for every ) and to (for every ), is an embedding from to . Since , we have for . By the definition of the function (see Definition 3.3), there is such that and .
Now, fixing an embedding from to , write for . Put and . We claim that the sets satisfy the requirements 1-2 in Lemma 2.4 with respect to , and as above, in the graph . In other words, we show that one can apply Lemma 2.4 with the sets being , and with as the host graph. We actually already proved that Item 1 in Lemma 2.4 holds; indeed, this follows from the fact that , the definition of the embedding scheme , and Items (a)-(b) above. Item 2 of Lemma 2.4 follows from Items 6-7 of Lemma 3.1, which together imply that for every and (with the exception of = ), the pair is -regular with , as required.
We thus showed that Lemma 2.4 is applicable to the tuple of sets and the graph (with the parameters defined above). Let be the set of all tuples , where , which induce (in ) a copy of in which plays the role of for every and . By Lemma 2.4, we have
[TABLE]
**where in the last inequality we used the guarantees of Item 8 in Lemma 3.1 and the monotonicity of the function . Observe that for every , the subgraph of induced by the vertex-set contains an induced copy of . Indeed, this follows from the definition of , the fact that , and the definition of the embedding scheme . Now sample an -tuple of vertices from according to the distribution and independently. Note that if every vertex in appears in the first vertices of the sample, and if the tuple of the last vertices of the sample belongs to , then the subgraph induced by the sample contains an induced copy of and hence does not satisfy (as ). The probability for this event is at least **
[TABLE]
Here we used (5) and Item 2 in Lemma 3.1. Next, note that , where in the last inequality we used Items 2 and 8 of Lemma 3.1. Similarly, . So we see that a sample of random vertices induces a graph which does not satisfy with probability at least . Therefore, a sample of vertices (see (4)) induces a graph not satisfying with probability at least
[TABLE]
**as required. This completes the proof. ** It is natural to ask about the dependence on of the sample complexity of the tester supplied by Theorem 1. One answer is that one cannot prove any upper bound on the sample complexity which holds uniformly for all properties , because it was shown in [6] that no such bound exists even in the standard model. Suppose then that one is interested only in “simple” properties such as induced -freeness (for some fixed ). In this case, it is not too hard to see that although we are iterating Lemma 2.8, which has wowzer-type (that is, iterated-tower) bounds181818To be precise, we mean here that the “standard” way of establishing Lemma 2.8 (which is also the way we prove this lemma in this paper) is via the strong regularity lemma (see Lemma 2.6), which is known to only give wowzer-type bounds [12, 25]. In [12], (an unweighted variant of) Lemma 2.8 was proved without the use of the strong regularity lemma, thus giving better, tower-type, bounds. This is alluded to in the following sentence. in this setting even for unweighted graphs (see [12, 25]), we are still getting “only” a wowzer-type bound. We should also point out that it might be possible to use the ideas in [12], together with those presented here, in order to get tower-type bounds on the sample complexity of testing induced -freeness in the VDF model.
4 VDF-Testable Properties are Extendable and Hereditary
In this section we prove the “only if” direction of Theorem 1. The proof is divided between Propositions 4.1 and 4.2. As shown in [19], we can (and will) always assume that a VDF tester only queries the input graph on pairs of vertices which it has sampled.
Proposition 4.1**.**
If a graph property is not extendable, then is not testable in the VDF model.
[Proof]Since is not extendable, there is a graph , such that no -vertex graph satisfying contains as an induced subgraph. Let be a graph obtained from by adding a “new” vertex (and putting an arbitrary bipartite graph between and ), let be the uniform distribution on , and let be the distribution on which assigns weight to each and weight191919Evidently, if one does not wish to allow vertices of weight [math], then one can instead assign to a weight tending to [math]; or, more accurately, a weight that is small enough with respect to (the inverse of) the sample complexity of an alleged tester for (in a proof by contradiction that such a tester does not exist). [math] to .
It is clear that for every integer , a sample of vertices from according to is indistinguishable from a sample of vertices from according to . Observe that satisfies while is -far from . To see that the latter statement is true, observe that by our choice of , no matter how we change the bipartite graph between and , we will always get a graph that does not satisfy . Hence, in order to make satisfy , one must change the adjacency relation between a pair of vertices from , whose weight (under ) is .
**Now, the fact that and are indistinguishable implies that is not testable202020We note that if is non-extendable but hereditary, then one can easily obtain infinitely many examples showing that is not testable (rather than just the one example given in the proof of Proposition 4.1). Indeed, instead of adding just one vertex to , one can add to any number of vertices (for a large ), and give these new vertices weight , while distributing the remaining weight uniformly among the vertices of (note that such an assignment is precisely what the setting of Theorem 9 forbids). The assumption that is hereditary implies that every graph obtained in this way is -far from satisfying . Also, if the weight given to the “new” vertices is small enough, then these two weighted graphs are indistinguishable by a sample of any prescribed size. in the VDF model. **
Proposition 4.2**.**
If a graph property is not hereditary, then is not testable in the VDF model.
[**Proof]Since is not hereditary, there is a graph and an induced subgraph of , such that satisfies but does not. Let be the uniform distribution on , and let be the distribution on which is supported on and uniform when conditioned on , i.e. if and if . Clearly, for every integer , a sample of vertices from according to is indistinguishable from a sample of vertices from according to . Also, satisfies , whereas is -far from because . Thus, is not testable212121In analogy to Footnote 20, we note that if is non-hereditary but extendable then one can obtain infinitely many examples showing that is not testable (rather than just the one given in the proof of Proposition 4.2). Indeed, the extendability of implies that there are arbitrarily large graphs which satisfy and contain (and hence also ) as an induced subgraph. Each of these graphs (together with an appropriate distribution, as in the proof of Proposition 4.2) is a witness to the non-testability of . in the VDF model. **
5 On Variations of the VDF Model and Related Problems
In the following two subsections we prove Theorems 6, 7, 8 and 9. We then consider two additional problems related to the VDF model; one problem asks if the query complexity in the VDF model is the same as in the standard model (for that are testable in the VDF model), and the other asks for a characterization of the properties that are testable in variants of the VDF model (as in Theorems 6-9). We start by giving the precise definitions of the settings considered in Theorems 6-9.
The “large inputs” model
In this model, a property is testable if there exists a function such that for every , is -testable with sample complexity depending only on under the promise that inputs always satisfy .
The “size-aware” model
In this model, testers are allowed to receive, as part of the input, the number of vertices of the input graph.
The “no heavy-weights” (NHW) model
In this model, a property is testable if there exists a function such that for every , is -testable with sample complexity depending only on under the promise that inputs always satisfy .
The “no light-weights” (NLW) model
In this model, a property is testable if for all , is -testable with sample complexity depending only on and under the promise that inputs always satisfy .
Theorem 6 (resp. 7, 8, 9) then states that every hereditary property is testable in the “large inputs” (resp. “size-aware”, NHW, NLW) model222222Note that if is testable in the “large inputs” model then it is also testable in the NHW model, because by setting we can make sure that the input graph has at least vertices. Still, we decided to include a separate proof for Theorem 8 (instead of deducing it from Theorem 6) for two reasons: one is that in the course of the proof we resolve another open question raised in [19]; and the other is that our proof of Theorem 8 shows that is testable (in the NHW model) by a tester that accepts if and only if the subgraph induced by the sample satisfies , whereas the tester given by the proof of Theorem 6 is not always of this form..
5.1 Proof of Theorems 6, 7 and 9
In this subsection we prove Theorems 6, 7 and 9, i.e. we show that every hereditary property is testable (with one-sided error) in the “large inputs”, “size-aware” and NLW models. Let us introduce some definitions that we will use throughout this subsection. Let be a hereditary graph property. A graph is called -good if for every there is an -vertex graph which satisfies and contains as an induced subgraph; this in particular implies that itself satisfies . If is not -good then it is called -bad, and we denote by the minimal such that there is no -vertex graph which satisfies and contains as an induced subgraph. In particular, if does not satisfy then it is -bad and . Note that since is hereditary, if is -bad then there is no graph on vertices for any which satisfies and contains as an induced subgraph. Now let be the property of being -good. Then and is hereditary, which follows from the definition of -goodness and the fact that is hereditary. Observe moreover that is extendable. Indeed, let , and suppose, for the sake of contradiction, that for every on vertices which contains as an induced subgraph, it holds that . Then for every such , there is no graph on vertices that satisfies and contains as an induced subgraph. But this means that there is no graph on vertices which satisfies and contains as an induced subgraph, in contradiction to . We note also that if itself is extendable then .
For an integer , let be the maximum of over all -bad graphs with at most vertices; if no such graphs exist, we set (this will not matter later on). We are now ready to prove Theorem 6, which we rephrase as follows.
Proposition 5.1**.**
For every hereditary property there are functions such that for every , the property is -testable with one-sided error and sample complexity under the promise that inputs always satisfy .
[Proof]Consider the (extendable and hereditary) property defined above. By Theorem 4, there is a function such that for every and for every vertex-weighted graph which is -far from , a sample of vertices from (taken from ) induces a subgraph which does not satisfy with probability at least .
Our (“large inputs”-model) tester for samples vertices, and accepts if and only if the subgraph induced by the sample satisfies . We prove the proposition with
Let be a vertex-weighted graph with . Suppose first that satisfies . Our goal is to show that the subgraph induced by a sample of vertices, taken from and independently, satisfies with probability . So suppose by contradiction that contains an induced subgraph on at most vertices which does not satisfy . In other words, is -bad. By the definition of , there is no graph on vertices which satisfies and contains as an induced subgraph. As , and as is hereditary, we get that does not satisfy , a contradiction.
**Suppose now that is -far from . Then is also -far from , as . By our choice of , a sample of vertices of , taken from and independently, does not satisfy with probability at least . So our tester rejects with probability at least , as required. ** It is natural to ask whether we can replace the function in Lemma 5.1 by a constant depending only on (and not on ). As is shown in the following proposition, we cannot.
Proposition 5.2**.**
There is a hereditary property such that for every , there is no tester for in the VDF model even if we are guaranteed that the input graph has at least vertices.
[**Proof]For each , let be the graph obtained from the -cycle by adding an isolated vertex. Consider the property -freeness. Let . Set and . Let be the uniform distribution on , and let be the distribution on which assigns weight [math] to the isolated vertex in , and is uniform on the rest of the vertices of . Then and is -far from , but a sample (of any number of vertices) from is indistinguishable from a sample of the same size from . This shows that is not testable even if we require input graphs to have at least vertices. ** We now move on to prove Theorem 7. [Proof of Theorem 7] Let be a hereditary graph property. Our goal is to design (and prove the correctness of) a one-sided-error tester for in the VDF model, provided that the tester receives as part of the input. Let be as in Lemma 5.1. On input , and (where is a graph and is a distribution on ), our tester works as follows:
If , then invoke the tester whose existence is guaranteed by Lemma 5.1, and accept if and only if this tester accepts. 2. 2.
Otherwise, i.e. if , then do the following: setting and , sample vertices according to and independently, and put . Accept if and only if there exists a graph on vertices which satisfies and contains as an induced subgraph (in the notation introduced at the beginning of this subsection, this is the same as saying that ).
Let us prove the correctness of our tester. First, Lemma 5.1 guarantees that if then the tester works correctly; namely, it accepts with probability if , and rejects with probability at least if is -far from .
So from now on we may assume that . Suppose first that . Evidently, for every there is a graph on vertices which satisfies and contains as an induced subgraph (indeed, is such a graph). Hence, the tester accepts with probability (see Item 2).
Now suppose that is -far from . Observe that for each , the probability that is
[TABLE]
**By taking the union bound over all (at most ) vertices which satisfy , we see that the probability that there is with , is at most . Suppose that every satisfies (this happens with probability at least ). Then (where in the last inequality we used our assumption that ). Now, if (by contradiction) there is a graph on vertices which satisfies and contains as an induced subgraph, then one can turn into by only adding/deleting edges which are incident to vertices in . Since , this stands in contradiction to the assumption that is -far from . We conclude that there is no such graph . This implies that is rejected with probability at least , as required. **
Finally, we prove Theorem 9, i.e. that every hereditary property is testable in the NLW model. We restate this theorem as follows.
Proposition 5.3**.**
For every hereditary property there is a function such that for all , the property is -testable with one-sided error and sample complexity under the promise that inputs always satisfy .
[Proof]We start by specifying the function . Consider the (extendable and hereditary) property defined above. By Theorem 4, there is a function such that for every and for every vertex-weighted graph which is -far from , a sample of vertices of (taken from ) induces a subgraph which does not satisfy with probability232323The statement of Theorem 4 only guarantees a success probability of , but this can clearly be amplified to by repeating the experiment times. at least . Now set and
[TABLE]
Our tester for in the NLW model simply samples a sequence of vertices of the input and accepts if and only if the subgraph induced by the sample satisfies . Evidently, this tester accepts with probability if the input satisfies . So to establish the correctness of our tester, it suffices to show that it rejects with probability at least if the input is -far from .
Let , and let be a vertex-weighted graph on vertices which is -far from , and in which all vertices have weight at least . Let be a sequence of random vertices of , sampled according to and independently, and set . We need to show that with probability at least , does not satisfy . Suppose first that . We claim that in this case we have with probability at least (this is clearly sufficient because itself does not satisfy ). For a vertex , the probability that for every is
[TABLE]
So by the union bound over all vertices of , we see that with probability at least , , as required.
Suppose now that . Our choice of guarantees that with probability at least , the graph does not satisfy , meaning that it is -bad. We will now show that with probability at least , we have . This will imply that with probability at least , contains as an induced subgraph a -bad graph on at most vertices, and also . By the definition of , this would imply that does not satisfy , as required.
So from now on, our goal is to show that with probability at least . Fix a partition of into sets , each of size at least . For each , let be the event that . Note that if occurs for every , then . Since , the probability that does not occur is at most
[TABLE]
**By the union bound, the probability that there is for which does not occur, is at most , as required. This completes the proof. **
5.2 Proof of Theorem 8
In this subsection we prove Theorem 8, i.e. we show that every hereditary property is testable in the NHW model. Again, we rephrase as follows.
Proposition 5.4**.**
For every hereditary property there are functions and such that for every , the property is -testable with one-sided error and sample complexity under the promise that inputs always satisfy .
The key idea in the proof of Proposition 5.4, which appeared in [19], is to “blow up” the vertex-weighted graph by replacing each vertex with a vertex-set whose size is proportional to , and thus obtain an (unweighted) graph , to which one can apply known testability results in the standard model.
Let us introduce some definitions. For a graph , say on , and for integers , a -blowup of is any graph admitting a vertex-partition such that for every , and such that the bipartite graph between and is complete if and empty if . The sets are called the blowup-sets. Note that we do not pose any restrictions on the graphs induced by the sets ; these graphs may be arbitrary. For simplicity of presentation, we assume henceforth that all vertex-weights are rational242424If one allows general (i.e. possibly irrational) weights, then it is necessary to change the definition of a -blowup by rounding to the closest integer. This results in an additive error of in the conclusion of Lemma 5.5, due to rounding. Consequently, in (the proofs of) Propositions 5.4 and 5.7 we need to consider -blowups with in order to have this error term go to [math]. We also need to replace in several places with (say) (or any other number smaller than ). For example, the conclusion of Proposition 5.7 should be that is testable in the VDF model by a tester having one-sided error and sample complexity .. Now let be a distribution on , and let be such that is an integer for every ; such an is called admissible. A -blowup of is a -blowup of with for every . Note that a blowup is always treated as “unweighted” (in other words, the distribution on its vertices is uniform). Goldreich [19] proved that for every graph and , if a vertex-weighted graph is -far from being -free, then for every admissible , any -blowup of is -far from being -free. Goldreich further asked whether the -factor can be avoided. In the following lemma we show that this is indeed the case, and moreover that an analogous statement holds for every hereditary property. This lemma is also the key ingredient in the proof of Proposition 5.4.
Lemma 5.5**.**
Let be a hereditary graph property and let be a vertex-weighted graph which is -far from . Then for every admissible , any -blowup of is -far from .
[Proof]Fix any admissible and let be a -blowup of . As above, we use to denote the vertices of , and to denote the corresponding blowup sets. Suppose by contradiction that there is a graph on that satisfies and is -close to . Let be the random graph defined as follows: the vertex-set of is . To define the edge-set of , sample for each a vertex uniformly at random, and make an edge in if and only if is an edge in (for ). Then satisfies (with probability ) because is isomorphic to an induced subgraph of and is hereditary. Let us compute the expected distance between and (here the distance is with respect to the distribution ). For each , the probability that is precisely
[TABLE]
**Hence, the expected distance between and is **
[TABLE]
**where the last inequality uses the assumption that is -close to . So is -close to a graph which satisfies , a contradiction. **
By combining Lemma 5.5 with the result of [5] (that all hereditary properties are testable with one-sided error in the standard model), we obtain the following: for every hereditary property , for every vertex-weighted graph which is -far from , for every admissible and for every -blowup of , it holds that is -far from with respect to the uniform distribution, and hence a sample of some vertices of , taken uniformly and independently, induces a graph which w.h.p. does not satisfy . Observe that this induced subgraph of has (essentially) the same distribution as the graph on obtained by sampling vertices from independently, and letting if and only if (this is precisely the graph defined in Theorem 5). We thus established Theorem 5, as promised in Subsection 1.2.
As noted in Subsection 1.2, the graph defined above is a blowup of an induced subgraph of , but is not necessarily a subgraph of in itself. This is because the sequence might contain repeated vertices. In other words, it may be the case that contains “forbidden subgraphs” which use several vertices from one of the blowup-sets, and thus do not correspond to “forbidden subgraphs” in . This creates an obstacle for proving Proposition 5.4, because in order to prove this proposition we need to know that a (suitably chosen) random induced subgraph of (and not just the blowup thereof) does not satisfy w.h.p. To avoid this obstacle, we use the assumption that all vertices in have relatively small weight, which guarantees that it is unlikely to sample more than once from some blowup-set (or in other words, that is isomorphic to .). We note that a different way of dealing with this obstacle is to restrict ourselves to properties for which we can guarantee, by appropriately choosing the graphs inside the blowup-sets, that there would not be any minimal forbidden subgraph which uses several vertices from one of the blowup-sets, see Subsection 5.3.
[Proof of Proposition 5.4] We start by specifying the functions and . By the main result of [5], there is a function such that for every and for every (unweighted) graph which is -far from , a sample of vertices from , taken uniformly at random and independently, induces a graph which does not satisfy with probability at least . Now set and
[TABLE]
Our tester for in the NHW model simply samples a sequence of vertices of the input and accepts if and only if the subgraph induced by the sample satisfies . Evidently, this tester accepts with probability if the input satisfies . So to establish the correctness of our tester, it suffices to show that it rejects with probability at least if the input is -far from .
Let and let be a vertex-weighted graph on vertices which is -far from , and in which all vertices have weight at most , where . Write and fix an admissible , that is, a positive integer such that is an integer for every . Let be an arbitrary -blowup of , and denote the blowup-sets by . By Lemma 5.5, is -far from . This implies that a random sequence of vertices of , sampled uniformly and independently, induces a graph which does not satisfy with probability at least .
Let be the map which maps all elements of to (for every ). Observe that for sampled uniformly, the random vertex has the distribution (because ). Furthermore, if a set satisfies for every , then is isomorphic to . Let be a random sequence of vertices of , sampled uniformly and independently, and set . Recall that does not satisfy with probability at least . Furthermore, the probability that for some is at most
[TABLE]
**We conclude that with probability at least , does not satisfy and for every , implying that does not satisfy either. This completes the proof. **
It is natural to ask whether the function from Proposition 5.4 needs to depend on , namely whether the statement of Proposition 5.4 holds even if is a constant function (depending only on ). The proof of Proposition 5.2 shows, however, that this is not the case. In other words, allowing to depend on is unavoidable.
5.3 Testing in the VDF Model vs. Testing in the Standard Model
It is natural to ask about the relation between the sample complexity for testing a property in the VDF model and the sample complexity for testing it in the standard model. More specifically, it will be interesting to resolve the following:
Problem 5.6**.**
Is it true that every extendable hereditary property can be tested in the VDF model with the same (or close to the same) sample complexity as in the (standard) dense graph model?
While at present we cannot answer this question, we can show that many natural properties can be tested in the VDF model with (exactly) the same sample complexity as that of the (optimal) tester for in the standard model, which works by sampling a certain number of vertices and accepting if and only if they induce a graph which satisfies . This is explained in the following paragraph.
As mentioned in Subsection 5.2, the assumption made in Proposition 5.4 regarding the non-existence of high-weight vertices is needed in order to handle the possibility of having copies of some (forbidden) graph in which do not correspond to copies of in (where is some blowup of ). For some graph properties, however, such an assumption is not required, as we can make sure that every copy of a minimal forbidden graph in will correspond to such a copy in . To make this precise, we need the following definition. A family of graphs is said to be blowup-avoidable if for every graph , say on , and for every -tuple of integers , there is a -blowup of with blowup-sets , such that there is no induced copy of any in which intersects some in at least vertices; in other words, for every , every induced copy of in corresponds to an induced copy of in . We say that a hereditary property is blowup-avoidable if the family of minimal forbidden induced subgraphs for is blowup-avoidable. We now prove the following proposition, which partially resolves Problem 5.6. The proof is similar to that of Proposition 5.4.
Proposition 5.7**.**
Let be a hereditary graph property which is blowup-avoidable, and suppose that admits a tester in the standard model, which works by sampling vertices uniformly at random and independently, and accepting if and only if the subgraph induced by the sample satisfies . Then is testable in the VDF model by a tester having one-sided error and sample complexity252525Provided that the input distributions are only allowed to assign rational weights. If irrational weights are allowed, then the sample complexity (of the VDF tester for ) should be slightly increased to (say) , see Footnote 24. .
[Proof]Given an input , the required VDF tester for samples (from ) a sequence of vertices, and accepts if and only if the subgraph induced by the sample satisfies . Since is hereditary, this tester accepts with probability if the input graph satisfies . So it remains to show that if the input is -far from , then with probability at least , a sequence of vertices of , sampled according to and independently, induces a graph which does not satisfy .
Let be the family of minimal forbidden induced subgraphs for . Let be a vertex-weighted graph on vertices, which is -far from . Write and fix an admissible , that is, a positive integer such that is an integer for every . As is blowup-avoidable, there is a -blowup of with blowup-sets , such that there is no induced copy of any in which intersects some in at least vertices. By Lemma 5.5, is -far from . So by our choice of , with probability at least it holds that a sequence of vertices of , sampled uniformly and independently, induces a graph which does not satisfy , and hence contains an induced copy of some .
**Let be the map which maps all elements of to (for every ). Observe that for sampled uniformly, the random vertex has the distribution . Note that by our choice of , if span an induced copy of some (in the graph ), then is injective (and hence an isomorphism), which implies that span an induced copy of in . It is now easy to see that a sequence of vertices of , sampled from and independently, does not satisfy with probability at least , as required. **
To demonstrate the usefulness of Proposition 5.7, observe that induced -freeness is blowup-avoidable for every (here is the path with edges). Indeed, this is established by taking the blowup-sets (in the definition of blowup-avoidability) to be cliques. By combining Proposition 5.7 with known results for the standard model [5, 3, 16], we immediately get that induced -freeness is testable in the VDF model with sample complexity if , and with sample complexity at most if .
We now describe another corollary of Proposition 5.7. We say that a graph property is closed under blowups if for every graph satisfying , every blowup of in which the blowup-sets are independent sets also satisfies . We claim that if a hereditary property is closed under blowups then it is also blowup-avoidable. To see this, let be the set of minimal forbidden induced subgraphs for , let be an -vertex graph, let be integers and let be the -blowup of in which the blowup-sets, , are independent. Let and suppose that contains an induced copy of . If, by contradiction, this copy intersects some in more than one vertex, then is a blowup of some graph with , where the blowup-sets are independent sets. Since is closed under blowups and , we must have ; but this contradicts the fact that is a minimal forbidden induced subgraph for .
So we see that the conclusion of Proposition 5.7 applies to hereditary properties which are closed under blowups. Some examples of such properties include -freeness; the property of having a homomorphism into a fixed graph (and in particular the property of being -colorable); and the property of being the blowup of a fixed graph (cf. [8]).
On the negative side, there are many natural hereditary properties which are extendable but not blowup-avoidable, such as the property of being -free for a graph which is neither a clique nor contains isolated vertices. It would be interesting to resolve Problem 5.6 for these properties.
5.4 Which Properties are Testable in the Variations of the VDF Model?
It may be interesting to characterize the graph properties which are testable in each of the variations of the VDF model (defined at the beginning of Section 5).
Problem 5.8**.**
Which graph properties are testable in the “large inputs”/“size-aware”/NHW/NLW model?
While at the moment we are unable to resolve Problem 5.8, we can rule out some initial guesses. A first guess might be that only hereditary properties are testable in these models. This, however, turns out to be false; for example, connectivity and hamiltonicity are testable in each of these models, as implied by the following proposition.
Proposition 5.9**.**
Let be a property such that for every there is so that every vertex-weighted graph on at least vertices is -close to . Then is testable in all four variations of the VDF model.
[Proof]The fact that is testable in the “large inputs” (resp. NHW) model is trivial; indeed, by choosing (resp. ) we can make sure that every input graph will be -close to , so a tester that simply accepts without making any queries is a valid tester for .
Let us now consider the NLW model. Given and an input graph with all vertex-weights at least , our tester for works as follows: setting , the tester samples vertices according to and independently (where is some large constant); if the number of distinct vertices in the sample is at least then the tester accepts (without making any queries), and otherwise the tester accepts if and only if the subgraph induced by the sample satisfies . To see that this is a valid tester, observe that if has less than vertices then w.h.p. the tester samples all the vertices, and if has at least vertices then w.h.p. there are at least distinct vertices in the sample. This can be argued similarly as in the proof of Proposition 5.3, using that all vertices have weight at least ; we omit the details.
**Finally, let us prove that is testable in the “size-aware” model. On input and , our tester for (in the “size-aware” model) does the following: if then the tester accepts without making any queries, and if then the tester samples vertices according to the distribution and independently, where , and accepts if and only if there is a graph on vertices which satisfies and contains as an induced subgraph. The proof of correctness for this tester is similar to the proof of Theorem 7, and we leave the details to the reader. ** In order to apply Proposition 5.9 to the properties of connectivity and hamiltonicity, we observe that any vertex-weighted graph with is -close to being hamiltonian (and hence also connected). To see that this holds, take a random (cyclic) ordering of the vertices of , and observe that for every pair of distinct , the probability that there is such that is . This implies that the expected value of is , where the last inequality follows from Cauchy-Schwarz (and the first sum is over unordered pairs ). This means that we can create a hamilton cycle by adding edges of total weight at most . Let us also note that for connectivity there is a simpler argument: if is a vertex-weighted graph with , then there is with , and we can make connected by connecting to all other vertices.
Note that in some of the restricted models (e.g. the NLW model), the tester given by (the proof of) Proposition 5.9 has 2-sided error. It is also not hard to see that the NLW model admits no 1-sided-error tester for, e.g., connectivity. This shows that (some of) the restricted models allow for properties which are testable with 2-sided error but not with 1-sided error (unlike the “ordinary” VDF model, where we know that every testable property can be tested with -sided error, as follows from Theorems 1 and 4; see also [19, Theorem 2.3]).
Another natural guess regarding the answer to Problem 5.8 would be that every property which is testable in the standard model is also testable in the restricted models (see [2] for a characterization of the properties testable in the standard model). This guess is ruled out by the following proposition, which describes a property which is testable in the standard model but not in the restricted models.
Proposition 5.10**.**
The property of having edge-density262626The edge-density of a (possibly vertex-weighted) graph is defined as ; in other words, the density is defined with respect to the uniform distribution on , and not with respect to the given distribution . at most is not testable in either of the four variants of the VDF model.
[**Proof]Let be the -vertex graph consisting of a clique of size and isolated vertices, and let be the uniform distribution on . Let be the -vertex graph consisting of a clique of size and isolated vertices, and let be the distribution on that assigns weight to every vertex of , and weight to every vertex of . Note that and are valid inputs in each of the variants of the VDF model (provided that is large enough), and that satisfies while is -far from . On the other hand, we now show that for every , a sample of vertices from is indistinguishable from a sample of vertices from (provided that is large enough with respect to ). To this end, let be a set of random vertices of sampled according to and independently (for ). Then for both , the graph consists of a clique and some isolated vertices. Letting be the clique in , we have **
[TABLE]
**and **
[TABLE]
**where in both cases, the additive term accounts for the event that some vertex has been sampled more than once. So we see that . This implies that the total variation distance between the distribution of and the distribution of is . It follows that is not testable in any of the four variants of the VDF model (note that knowing does not help to distinguish between and , since these graphs have the same number of vertices). **
The proof of Proposition 5.10 can be adapted to show that other properties are also not testable in either of the variants of the VDF model. These properties include the property of having a cut with at least edges (for ) and the property of containing a clique with at least vertices (for ).
Acknowledgements
We are grateful to an anonymous referee for spotting a gap in the proof of Theorem 1 in a preliminary version of the paper.
6 Proof of Lemmas 2.5 and 2.6
Here we prove lemmas 2.5 and 2.6. We start by extending some basic results about regular partitions to the vertex-weighted setting.
Lemma 6.1**.**
Let be disjoint vertex-sets in a vertex-weighted graph , and let be partitions of , respectively. Then
[TABLE]
and
[TABLE]
[Proof]We start with the first part of the lemma.
[TABLE]
To prove the second part, we set for each , . Now,
[TABLE]
**where in the last equality we used the first part of the lemma. ** Let be a vertex-weighted graph, and let be a partition of . The index of , denoted , is defined as
[TABLE]
Lemma 6.2**.**
For every vertex-partition of a vertex-weighted graph , and for every refinement of , we have .
[Proof]Write , and for each put . Then
[TABLE]
**where in the second inequality we used the second part of Lemma 6.1. **
Lemma 6.3**.**
Let be a vertex-weighted graph and let be a non--regular partition of . Then there is a refinement of such that and .
[Proof]For each for which is not -regular, let , be such that , and . For each , let be the partition of , formed by taking the common refinement of the partitions , where runs over all indices for which is not -regular. Let be the resulting refinement of . Then clearly . We now show that . First, observe that by Lemma 6.1, for every we have Next, fix any pair for which is not -regular. By Lemma 6.1 we have
[TABLE]
where in the penultimate inequality we used the first part of Lemma 6.1 to infer that
[TABLE]
Denoting by the set of pairs for which is not -regular, we see that
[TABLE]
**where in the last inequality we used the assumption that is not -regular. **
[**Proof of Lemma 2.5] For , if is not -regular then we apply Lemma 6.3 to obtain a partition which refines and satisfies and . Since the index of any partition is at most , this process must end after at most steps. When the process ends, we have an -regular partition. Since the number of steps depends only on , the size of the resulting final partition can be upper-bounded by a function of and , as required. ** [Proof of Lemma 2.6] We may assume, without loss of generality, that is monotone decreasing. Let be the partition obtained by applying Lemma 2.5 with parameter and with the partition . Next, for each , apply Lemma 2.5 with parameter and with the partition to obtain a partition which is -regular and refines . In light of Lemma 6.2, and as the index of any partition is at most , there must be some for which . For such an , set and . Since and the number of steps in the process is at most , and since the size of the partition guaranteed by Lemma 2.5 can be bounded from above by a function of the parameters of this lemma (which in our case depend only on and ), we see that too can be bounded from above by a function of and . This proves Item 1.
Item 2 is immediate from our choice of . It remains to prove Item 3. By the definition of the index and by our choice of and , we have
[TABLE]
where in the first equality we used the second part of Lemma 6.1. The above implies that
[TABLE]
and hence
[TABLE]
**where the first inequality follows from Cauchy-Schwarz. This completes the proof. **
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Alon, E. Fischer, M. Krivelevich and M. Szegedy, Efficient testing of large graphs. Combinatorica 20 (2000), 451–476.
- 2[2] N. Alon, E. Fischer, I. Newman and A. Shapira, A combinatorial characterization of the testable graph properties: it’s all about regularity. SIAM Journal on Computing, 39(1) (2009), 143–167.
- 3[3] N. Alon and J. Fox, Easily testable graph properties, Combin. Probab. Comput. 24 (2015), 646–657.
- 4[4] N. Alon and A. Shapira, A characterization of easily testable induced subgraphs. Combinatorics, Probability and Computing 15 (2006), 791–805.
- 5[5] N. Alon and A. Shapira, A characterization of the (natural) graph properties testable with one-sided error. SIAM Journal on Computing 37 (2008), 1703–1727.
- 6[6] N. Alon and A. Shapira, Every monotone graph property is testable. SIAM Journal on Computing, 38(2) (2008), 505–522.
- 7[7] T. Austin and T. Tao, On the testability and repair of hereditary hypergraph properties, Random Structures and Algorithms 36 (2010), 373–463.
- 8[8] L. Avigad and O. Goldreich, Testing graph blow-up. In Studies in Complexity and Cryptography, Miscellanea on the Interplay between Randomness and Computation (2011), pp. 156–172. Springer, Berlin, Heidelberg.
