Towards Testing Monotonicity of Distributions Over General Posets
Maryam Aliakbarpour, Themis Gouleakis, John Peebles, Ronitt Rubinfeld,, Anak Yodpinyanee

TL;DR
This paper investigates the sample complexity of testing distribution monotonicity over general posets, introducing a new property called bigness, and establishing lower bounds and tools for upper bounds in various poset structures.
Contribution
It introduces the concept of bigness for distributions, derives lower bounds for testing monotonicity, and provides tools for analyzing upper bounds in general posets.
Findings
Lower bound of Ω(n/log n) for testing bigness.
Lower bounds for testing monotonicity over specific posets.
Sublinear sample complexity bounds for certain cases.
Abstract
In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution over a poset is monotone if, for any pair of domain elements and such that , . To understand the sample complexity of this problem, we introduce a new property called bigness over a finite domain, where the distribution is -big if the minimum probability for any domain element is at least . We establish a lower bound of for testing bigness of distributions on domains of size . We then build on these lower bounds to give lower bounds for testing monotonicity over a matching poset of size and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Optimization and Search Problems
Towards Testing Monotonicity of Distributions Over General Posets
Maryam Aliakbarpour
CSAIL, MIT
[email protected] MA is supported by funds from the MIT-IBM Watson AI Lab (Agreement No. W1771646), the NSF grants IIS-1741137, and CCF-1733808.
Themis Gouleakis
Max Planck Institute
[email protected] TG is supported by the NSF grants CCF-1740751, CCF-1650733, CCF-1733808, and IIS-1741137. Part of this work was done while TG was a postdoctoral researcher at USC supported by Ilias Diakonikolas’ USC startup grant.
John Peebles
CSAIL, MIT
[email protected] JP is supported by the NSF grants CCF-1565235, CCF-1650733, CCF-1733808, and IIS-1741137.
Ronitt Rubinfeld
CSAIL, MIT, TAU
[email protected] RR is supported by by funds from the MIT-IBM Watson AI Lab (Agreement No. W1771646), the NSF grants CCF-1650733, CCF-1733808, IIS-1741137 and CCF-1740751.
Anak Yodpinyanee
CSAIL, MIT
[email protected] AY is supported by the NSF grants CCF-1650733, CCF-1733808, IIS-1741137 and the DPST scholarship, Royal Thai Government. This work was completed while AY was at CSAIL, MIT.
Abstract
In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution over a poset is monotone if, for any pair of domain elements and such that , .
To understand the sample complexity of this problem, we introduce a new property called bigness over a finite domain, where the distribution is -big if the minimum probability for any domain element is at least . We establish a lower bound of for testing bigness of distributions on domains of size . We then build on these lower bounds to give lower bounds for testing monotonicity over a matching poset of size and significantly improved lower bounds over the hypercube poset.
We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem.
Keywords: Property Testing; Monotone Distributions; Partially Ordered Sets;
1 Introduction
We consider the problem of testing whether a distribution is monotone: an essential property that captures many observed phenomena of real-world probability distributions. For instance, monotone distributions over totally ordered sets might be used to describe distributions on diseases for which the probability of being affected by the disease increases with age. More generally, an important class of distributions are characterized by being monotone over a partially ordered set (poset). For these distributions, if a domain element lower bounds in the partial ordering (denoted ), then (whereas if and are unrelated in the poset, then needs not satisfy any particular requirement on the relative probabilities of and ). Such distributions might include distributions on diseases for which the probability of being affected increases by some combination of several risk factors. Many commonly studied distributions, e.g. exponential distributions or multivariate exponential distributions, are or can be approximated by piecewise monotone functions. As monotone distributions are a fundamental class of distributions, the problem of testing whether a distribution is monotone is a key building block for distribution testing algorithms.
Given an unknown distribution, over a poset domain, the goal is to distinguish whether the distribution is monotone or far from any monotone distribution, using as few samples as possible. This problem has been considered in the literature: the problem of testing whether a distribution is monotone was first considered in the work of [BKR04], where testing the monotonicity of distributions over totally ordered domains and partially ordered domains that corresponded to two-dimensional grids were considered. The work of [BFRV10] introduced the study of testing the monotonicity of distributions over general partially ordered domains, and in particular, considered the Boolean hypercube (). Several other works considered these questions [DDS*+*13, ADK15, CDGR18] under various different domains and achieved improved sample complexity bounds.
The sample complexity of the testing problem varies greatly with the structure of the poset: On the one hand, for domains of size that are total orders, samples suffice for distinguishing monotone distributions, from those that are -far in total variation distance from any monotone distribution [BKR04, ADK15, CDGR18]. On the other hand, testing distributions defined over the matching poset requires nearly linear in , specifically , samples [BFRV10]. Furthermore, for a large class of familiar posets, such as the Boolean hypercubes, little is understood about the sample complexity of the testing problem.
Our results and approaches:
We first define a new property called the bigness property, which we use as our main building block for establishing sample complexity lower bounds for monotonicity testing. A distribution is -big if every domain element is assigned probability mass at least .
Though the bigness property is a symmetric property (i.e., permuting the labels of the elements does not change whether the distribution has the property or not), we use lower bounds for testing the bigness property in order to prove lower bounds on testing monotonicity, which is not a symmetric property. In addition, the bigness property is a natural property, and thus of interest in its own right.
We show that the sample complexity of the bigness testing problem is when . The upper bound follows from applying the algorithm of [VV17] that learns the underlying distribution up to a permutation of the domain elements. Our lower bound approach is inspired by the framework of [WY16a], used to lower bound the number of samples needed to estimate support sizes. Our lower bound is established by showing that the distribution of samples, one generated from -big distributions (’s) and the other generated from distributions that are -far from -big (’s), are statistically close. In contrast with the standard lower bound framework, and are not picked from two sets of distributions. Instead, the distribution (resp. ) is constructed by having each domain element choose its probability , in an i.i.d. fashion, from the distribution (resp. ) over possible probabilities in . To design and , we introduce a new optimization problem that maximizes while keeping the distribution of samples statistically close. This constraint is established via the moments matching technique, which allows us to show that the distributions are indistinguishable with samples, but also plays a crucial role in many other settings [RRSS09, Val08, BFRV10, VV16, VV17, WY16a, WY16b].
By reducing from the bigness testing problem, we next give a lower bound of on the sample complexity of the monotonicity testing problem over the matching poset, improving on the lower bound in [BFRV10]. In addition to improving the sample complexity lower bound, one particularly useful byproduct of our approach is that the maximum probability of an element in the constructed lower bound distribution families can be made small, which assists us in proving lower bounds for other posets in the following.
Finally, we leverage the lower bound for the monotonicity testing problem over the matching poset to prove a lower bound of for for monotonicity testing over the Boolean hypercube of size , greatly improving upon the standard “Birthday Paradox” lower bound of . Our reduction follows from finding a large embedding of the matching poset in the hypercube, and its efficiency follows from the previously mentioned upper bound on the maximum element probability from the bigness lower bound construction above.
We then give a number of new tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem:
We prove that the distance of a distribution to monotonicity can be characterized approximately as the weight of a maximum weighted matching in the transitive closure of the poset, where the weight of the edge is the amount of violation from being monotone: . This characterization gives a structural result about distributions that are -far from monotone. Moreover, this results extends the work of [FLN*+*02] to non-boolean valued functions. The work of [FLN*+*02] shows that the distance of a boolean function to monotonicity is related to the number of “violating edges” in the transitive closure of the underlying poset. 2. 2.
Via the characterization above, we show that the monotonicity testing problem over bipartite posets (where all edges are directed in the same direction) captures the monotonicity testing problem in its full generality. That is, we give a reduction from monotonicity testing over any poset to monotonicity testing over a bipartite poset. Our reduction preserves the number of vertices and the distance parameter up to a constant multiplicative factor. As in the previous, this result extends the work of [FLN*+*02] to non-boolean valued functions. 3. 3.
Leveraging the learning algorithms for symmetric distributions in [VV17], we propose algorithms with sample complexity for testing bigness of a distribution, and for testing monotonicity on matching posets. The proof of our latter result requires certain subtle details: (1) an additional reduction that allows us to scale our distribution for “each side” of the matching, in order to generate sufficient samples from each side, as required by the algorithm of [VV17], and (2) technical lemmas establishing bounds between the total variation distance and the distance notion in [VV17], under the scaling mentioned earlier. 4. 4.
We give a reduction from monotonicity testing on a bipartite poset, to monotonicity testing on the matching (for which the testing algorithm is constructed above). This reduction gives an algorithm for monotonicity testing on any bipartite poset (which is the most general problem, as argued earlier), in which the overhead in the sample complexity depends only on the maximum degree of the bipartite graph. 5. 5.
We give another upper bound for testing monotonicity on bipartite posets: where is the number of “endpoint sets” of all possible matchings contained in the given bipartite graph (or equivalently, the number of induced subgraphs that admit a perfect matching over their respective vertex sets). Note that for the matching poset, yields an upper bound, and therefore for matching posets our previous algorithm is preferable. However, this bound yields an upper bound of for all posets, and could potentially be even smaller for certain classes of graphs, such as collections of large stars. 6. 6.
Finally, we give an upper bound of samples for monotonicity testing on bipartite posets, under the guarantee that the distribution being tested is a uniform distribution on some subset of known size of the domain. This special case is of interest in that it relates to the well studied problem of testing monotonicity of Boolean functions, in a somewhat different setting where instead of getting query access to the function, we are given uniform “positive” samples of domain elements for which .
Other related work
Batu, Kumar, and Rubinfeld [BKR04] initiated the study of testing monotonicity of distributions. For the case where the domain is totally ordered, the sample complexity is known to be [BKR04, ADK15, CDGR18]. Several works have considered distributions over higher dimensional domains. In [BKR04, BFRV10], it is shown that testing monotonicity of a distribution on the two dimensional grid (here ) can be performed using samples. For higher dimensional grids (where ), Bhattacharyya et al. provided an algorithm that uses samples [BFRV10]. Acharya et al. gave an upper bound of and a lower bound of [ADK15]. While their result gives a tight bound of when is relatively small compared to , it does not yield a tester for Boolean hypercubes using a sublinear number of samples.
Bhattacharyya et al. considered the problem of monotonicity testing over general posets [BFRV10]. In particular, they proposed an algorithm for testing the monotonicity of distributions over hypercubes (where ) using samples. They provide a lower bound of for testing monotonicity of distributions over a matching of size , and a lower bound of when the poset contains a linear-sized matching in the transitive closure of its Hasse digraph.
In addition to the above, testing monotonicity of distributions has been considered in various settings [ACS10, DDS12, Can15]. There are several works on testing various properties, e.g. uniformity, closeness, and independence when the underlying distribution is monotone [BDKR05, BKR04, RS05, DDS*+*13, AJOS13].
Testing monotonicity of boolean functions is also well studied (e.g., [GGLR98, DGL*+*99, LR01, FLN*+*02, CS13, CS14, BB16, BCS18]). In the general regime, the algorithm can query the value of the function at any element in the poset. This ability is in sharp contrast with our model, in which the algorithm only receive samples according to the distribution, which do not directly reveal the probability of the elements. It is known that one can test monotonicity of functions over hypergrids, and hypercubes using as few as polylogarithmic queries in the size of the domain. This query complexity is exponentially smaller than the sample complexity of testing monotonicity of distributions, demonstrating that there are inherent differences between the two problems.
2 Preliminaries
We use to indicate the set . Throughout this paper we use the total variation distance denoted by unless otherwise stated. We also denote the -distance by . For a distribution , we denote the probability of the domain element by . Given a multiset of samples from a distribution on , the histogram of the samples is an -dimensional vector, , where is the frequency of the -th element in the sample set.
A poset is called a line if and only if contains all the edges for . We say a poset is a matching if all of the edges in the poset are vertex-disjoint. We say a poset is bipartite if the set of vertices can be decomposed in two sets, the top set and the bottom set, where no two vertices in the same set are connected. Moreover, the direction of all the edges is from the top set to the bottom set. We use similar terminology for the matching poset as well. In addition, we say a poset is an -dimensional hypercube when is and contains all edges where there exists a coordinate such that and and for all .
Monotonicity.
A partially-ordered set (poset) is described as a directed graph , where each edge indicates the relationship on the poset. A matching poset is a poset where the underlying graph is a matching. A distribution over a poset domain is a distribution over the vertex set . A distribution is monotone (with respect to a poset ) if for every edge (i.e., every ordered pair ), . Let be the set of all monotone distributions over the poset . We say that is -far from monotone if its distance to monotonicity, , is at least .
Definition 2.1**.**
Let be a distribution on poset and be the proximity parameter. Suppose an algorithm, , has sample access to and the full description of poset . is called a monotonicty tester for distributions if the following is true with probability at least when the tester has sample access to the distribution.
- •
If is monotone, then outputs accept.
- •
If is -far from monotone, then outputs reject.
Bigness.
A probability distribution over a domain is -big if, for every domain element , . Related notions for distance to -bigness are defined analogously. The parameter is called the bigness threshold, and may be omitted if it is clear from the context. Let indicate the set of all distributions over that are -big. We define the distance to -bigness as . If this distance is at least , we say the distribution is -far from being -big.
Definition 2.2**.**
Let be a distribution on . Suppose Algorithm receives threshold and bigness parameter , and has sample access to . is a -bigness tester if the following is true with probability at least .
- •
If is -big, then outputs accept.
- •
If is -far from -big, then outputs reject.
Also, -bigness testing problem refers to the task of distinguishing the above cases with high probability.
Remark 2.3**.**
Note that the probability is arbitrary in the above definitions. One can amplify the probability of outputting the correct answer to by increasing the number of samples by an factor.
3 Overview of Our Techniques
In this section, we give an overview of our results and the high-level idea of our techniques.
3.1 A lower bound for the bigness testing problem
In Section 4, we provide two random processes for generating histograms of samples from two families of distributions, such that one family consists of “big” distributions, and the other family largely of “-far from big” distributions. Then, we show that unless a large number of samples have been drawn, the distributions over the histograms generated via these two random processes are statistically very close to each other, and hence appear indistinguishable to any algorithm, as specified precisely in Theorem 3.1. The construction yields a lower bound for the general problem of testing the bigness property in Corollary 3.1. Furthermore, the construction provides a useful building block for establishing further lower bounds for monotonicity testing in various scenarios in Section 5.
To generate histograms from the two families of distributions, imagine the following process: We have two prior distributions and , and we generate probability vectors (measures), and , according to the priors: Each domain element randomly picks its probability in an i.i.d fashion from the prior distribution. More precisely, let be i.i.d. random variables from prior , then is defined to be the following:
[TABLE]
We generate similarly according to prior . While the total probability is unlikely to sum to , we will design the priors, and , so that we can later modify or into a probability distribution with only small changes. We then generate histograms of samples from (the normalization of) by drawing independent random variables (namely ) for , and output as the histogram of the samples. Note that by Poissonization method, one may view the histogram as being generated from a set of samples from the normalization of . Hence, if is close to one, the histogram serves as a set of roughly samples. We set more specifically in terms of the rest of the parameters later.
The goal in Section 4 is to find two prior distributions and , then generate two probability vectors and , and two histograms and according to them respectively, such that the following events hold with high probability.
The probability vectors and are approximate probability distributions; that is, their total probability masses are each close to . 2. 2.
After scaling the probability vectors and above into respective probability distributions, the normalization of is -big, and the normalization of is -far from any -big distribution. 3. 3.
The total numbers of (Poissonized) samples in and drawn from the normalization of and are each , where is the sample complexity lower bound we are aiming to prove. 4. 4.
Given or , distinguishing whether it is generated from or with success probability requires or to contain at least samples. 5. 5.
Additionally, we will bound the largest probability mass that the normalized distributions place on any domain element – this part is not necessary for this section, but will be useful for the reduction between monotonicity testing and bigness testing later on.
Now, if we choose and carefully such that and are generated according to the above process based on and are hard to distinguish, then we can establish a lower bound for the bigness testing problem. We state this result more formally as the following theorem in Section 4.
{restatable*}
theorembignessLB For integer and sufficiently small , there exist a parameter and two distributions and over the set of possible histograms of size at least with the following properties:
- •
The histogram generated from is drawn from a -big distribution.
- •
The histogram generated from is drawn from a distribution which is -far from any -big distribution.
- •
.
- •
The largest probability mass among any elements in any probability distributions above (from which the histograms are drawn) is .
An important case of this theorem is when , where we establish a nearly linear sample complexity lower bound of for the general problem of bigness testing as follows.
{restatable*}
corollarybignessTest For sufficiently small parameter , there exists a parameter such that any algorithm that can distinguish whether a distribution over is -big or -far from any -big distribution with probability requires samples. In particular when is a constant, is constant, then any such algorithm requires samples.
We propose the following optimization problem, , such that its optimal solution specifies and , satisfying the requirements of the theorem. Intuitively speaking, as aims to generate -big distributions, we must ensure that ’s are bounded away from , so that has expected value higher than . At the same time, we hope to maximize the probability that so that has lots of domain elements with probability zero to make its normalization far from any -big distribution. In addition, we find and under the constraint that the first moments of them are exactly matched, as to ensure that the resulting distributions over the histograms, and , are statistically close. The objective value of this optimization problem corresponds to the expected distance of to the closest -big distribution in the -distance.
[TABLE]
In the above optimization problem, the unknowns are , , and . and are two parameters specified latter in the proof. That is we are looking for two distributions and such that two random variables and drawn from them respectively have expected value one, and their first moments are matched. Also, controls the range of the probabilities, ’s and ’s, and the distance to the bigness property.
We relate the optimal solution for to an LP defined by [WY16a], who in turn relate their LP to the error from the best polynomial approximation of the function over the interval . By doing this, we show the existence of a solution where the value , which is proportional to the distance to -bigness in the second family, is sufficiently large.
Our proof relies on and extends the lower bound techniques for estimating support size provided in [WY16a], incorporating specific conditions for the bigness problem. Firstly, unlike the support size estimation problem, we need our distributions to be fully-supported on the domain for the big distributions, whereas in their case, both families of distributions are allowed to be partially supported. Secondly, our optimization problem treats the threshold as a variable, whereas the support size problem simply imposes the strict threshold of . Thirdly, based on this construction, we must also give a direct upper bound for the maximum probability, which facilitates our later proofs for providing lower bounds for the matching and hypercube posets.
3.2 From bigness lower bounds to monotonicity lower bounds
In Section 5, we show how to turn our lower bound results for bigness testing problem in Section 4, into lower bounds for monotonicity testing in some fundamental posets, namely the matching poset and the Boolean hypercube poset.
Matching poset.
To establish our lower bound for testing monotonicity of the matching poset, we construct our distribution by assigning probability masses to the endpoints of edges in our matching as follows: the vertices ’s are assigned probability masses according to the -bigness construction, whereas the vertices ’s are uniformly assigned the threshold as their probability masses; the assigned probabilities are then normalized into a proper probability distribution. We show that before normalization, if the original distribution is big; and otherwise, the distance to the monotonicity of the constructed distribution measures exactly the distance to the -bigness property. We then show that the normalization step scales the entire distribution down by only a constant factor, hence the lower bounds for the monotonicity testing over the matching poset with vertices asymptotically preserves the parameters and of the lower bound on bigness construction for domain elements.
Hypercube poset.
To achieve our results for the Boolean hypercube, we embed our distributions over the matching poset into two consecutive levels and of the hypercube (where denotes the number of ones in the vertices’ binary representation). We pair up elements in these levels in such a way that distinct edges of the matching have incomparable endpoints: the algorithm must obtain samples of these matched vertices in order to decide whether the given distribution is monotone or not. We also place probability mass on all other vertices on level and above, and probability mass [math] on all remaining vertices, in order to ensure that the distribution is monotone everywhere else. Lastly, we rescale the entire construction down into a proper probability distribution. Unlike the matching poset, sometimes this scaling factor is super-constant, shrinking the overall distance to monotonicity, , to sub-constant. Here, we make use of our upper bound on of the bigness lower bound construction to determine the scaling factor.
3.3 Reduction from general posets to bipartite graphs
In Section 6, we show that the problem of monotonicity testing of distributions over the bipartite posets is essentially the “hardest” case of monotonicity testing in general poset domains. That is, we show that for any distribution over some poset domain of size , represented as a directed graph , there exists a distribution over a bipartite poset of size such that (1) preserves the total variation distance of to monotonicity up to a small multiplicative constant factor, and (2) each sample for can be generated using one sample drawn from . These properties together imply the following main theorem of the section.
{restatable*}
theoremgeneraltobipartite
Suppose that there exists an algorithm that tests monotonicity of a distribution over a bipartite poset domain of elements using samples for any total variation distance parameter . Then, there exists an algorithm that tests monotonicity of a distribution over any poset domain of elements using samples.
Our approach may be summarized as follows. We first show, in Theorem 3.3, that we may characterize (up to a constant factor) the distance of to monotonicity, as the size of the maximum matching on the transitive closure of , denoted by , where the weight represents the amount that is violating the monotonicity condition. In particular, we have the following theorem:
{restatable*}
theoremdistToMonMatching Consider a poset and a distribution over its vertices. Suppose every edge in the has a weight of . Then, the total variation distance of to any monotone distribution is within a factor of two of the weight of the maximum weighted matching in .
This crucial theorem provides a combinatorial way to approximate the distance to monotonicity for general posets, leading to our upcoming construction of for Theorem 3.3 as well as some algorithms in Section 7. Theorem 3.3 is shown via LP duality: the dual LP for the problem of optimally “fixing” to make it monotone, turns out to align with the maximum (fractional) matching problem on ’s transitive closure. In particular, the dual constraints are of the form where is a totally unimodular matrix, implying that an integral optimal solution exists, namely the maximum matching.
To prove Theorem 3.3, given the original poset , we create a bipartite poset with two copies and of each original vertex : the vertices ’s and ’s form the bipartition of the new bipartite poset of size . We add to the bipartite poset if is in the transitive closure of ; that is, there exists a directed path from to in . The new probability distribution on , is created from on , by dividing the probability mass equally among and . Note that a sample from is obtained by drawing from and adding the sign equiprobably. It follows via transitivity that is monotone over when is monotone over , and via Theorem 3.3 that if is -far from monotone on , then is also at least -far from monotone over . These conditions allow us to test monotonicity of on any general poset by instead testing monotonicity of on a bipartite poset with parameter , as desired.
3.4 Upper bounds results
In Section 7, we provide sublinear algorithms for testing bigness, and testing monotonicity of distributions over different poset domains.
Bigness testing.
In Section 7.1, we provide an algorithm for bigness testing. Observe that the -bigness property is a symmetric property: closed under permutation of the labels of the domain elements . Hence, we leverage the result of [VV17] that learns the counts of elements for each probability mass: . Observe that the distance to -bigness is proportional to the total “deficits” of elements with probability mass below . Hence, this learned information suffices for constructing an algorithm for testing bigness, using a sub-linear, , number of samples.
Monotonicity testing for matchings.
Next, in Section 7.2, we provide an algorithm for testing monotonicity of matching posets. We again resort to the work of [VV17] for learning the counts of elements for each pair of probability masses, with respect to a pair of distributions over the domain , namely , given samples each from and . We hope to consider our distribution over a matching with as a pair of distributions, namely and , representing probability masses places over and , respectively. Learning would intuitively allows us to approximate ’s distance to monotonicity by summing up the “violation” for pairs . However, there are subtle challenges to this approach that do not present in the earlier case of bigness testing.
First, we must somehow rescale and up into distributions according to their total masses , placed by . However, it is possible that, say, , making samples from costly to generate by drawing i.i.d. samples from . We resolve this issue via a reduction to a different distribution that approximately preserves the distance to bigness, while placing comparable total probability masses to and . Second, the algorithm of [VV17] learns according to a certain distance function, that we must lower-bound by the total variation distance. In particular, this bound must be established under the presence of errors in the scaling factor, as and are not known to the algorithm. We overcome these technical issues, which yields an algorithm for testing monotonicity over matchings. We maintain the same asymptotic complexity as that of [VV17].
Monotonicity testing for bounded-degree bipartite graphs.
Moving on, in Section 7.3, we tackle the problem of monotonicity testing in bipartite posets; as shown in Section 6, this bipartite problem captures the monotonicity testing problem of any poset. We make progress towards resolving this problem by offering our solution for the bounded-degree case. We turn the distribution on a bipartite poset of maximum degree , into a distribution on a matching poset that approximately preserves the distance to monotonicity: applying the algorithm of Section 7.2 above constitutes a monotonicity test for with sample complexity .
Our reduction simply places copies of each vertex into , then for each edge , connects a pair of unused endpoints ), as to create a matching subgraph of size on . The probability distribution on simply distributes probability mass equally among all copies ’s. (Each remaining, isolated vertex is matched with a dummy [math]-mass vertex, turning into a matching poset.) This new graph contains vertices, and we show that by explicitly creating a “low-cost” scheme for “fixing” into a monotone distribution on , based on the optimal scheme that turns monotone on , charging at most an extra -multiplicative factor.
Testing monotonicity of distributions that are uniform on a subset of the domain.
In Section 7.4, we show that for a specific broad family of distributions on directed bipartite graphs of arbitrary degree, we can test monotonicity of such distribution using samples. Namely, our result applies for distributions that are uniform on an arbitrary subset of the domain, given that every poset edge is directed from some vertex in the “bottom” part to some vertex in the “top” part of the graph. Our tester performs roughly the following: First, we sample a number of vertices from the graph and throw away ones that lie in the top part. For the remaining ones in the bottom part, denoted , we identify their neighbors in the top part, and determine whether or not they all belong to the support of the distribution. Since the distribution is uniform in its support, this condition is sufficient for the distribution to be monotone in the induced subgraph . The tester accepts when it cannot rule out the possibility that has the maximum possible probability mass. Recall that if the distribution is -far from monotone, there must exist a large matching of “violated” edges. To this end, we show that the induced subgraph contains many disjoint violated edges, implying that there are many vertices in outside of the support: the probability mass on will be noticeably small and the tester will reject.
Upper bound via trying all matchings.
In Section 7.5 we give another upper bound for testing monotonicity of a distribution with respect to a bipartite graph which, in this case, has a small number of induced subgraphs that contains a perfect matching of their vertices. In particular, we show that samples are sufficient for this task, where is the number of such induced subgraphs. We note that this bound matches the general learning upper bound of when attains its maximum value of , but can potentially be better when is asymptotically smaller. The main idea of our tester is as follows: if the distribution is -far from monotone, there exists a matching of violated edges that is -far from monotone. Hence, for each subgraph of that admits a perfect matching, we may approximate the weight (violation amount) of this matching by simply comparing the total probability masses between the top part and the bottom part of the subgraph. We approximate these masses with error probability for each subgraph, which allows us to apply a union bound over all subgraphs at the end. Our tester rejects if the weight of one such subgraph exceeds , or accepts otherwise.
4 A Lower Bound for the Bigness Testing Problem
In this section, we give a lower bound for the bigness testing problems. As described in the overview in Section 3.1, we provide two random processes for generating samples from two families of distributions, such that one family consists of “big” distributions, and the other family largely of “-far from big” distributions, and then show that they are hard to distinguish.
First, we define a random process that, given a prior distribution, , over non-negative numbers, generates a random probability distribution over the domain elements , and then draws samples from it. More specifically, let be a random variable drawn from , and we also use to denote the probability density function (PDF) over ; for now we require , and will specify further desired properties momentarily. We generate an approximate probability distribution according to . The distribution is constructed by having each domain element choose its probability , in an i.i.d. fashion, from the prior distribution, , over possible probabilities. Then, we construct a histogram of roughly samples from according to the following steps:
- •
Step 1: Generate i.i.d. random variables according to , then form the following probability vector over :
[TABLE]
Remark that, while is not necessarily a probability distribution under this notion, the condition suggests that the total probability masses of is likely to be centered around . So, is likely to be approximately a probability distribution, and can be normalized into one while modifying individual entries ’s by only a small multiplicative factor.
- •
Step 2: Draw independent random variables (namely ) for , and output as the histogram of the samples. While we do not explicitly normalize , since is an approximate probability distribution, this histogram still captures (with high probability) Poissonized samples drawn from the normalization of .
The goal in this section is to find two prior distributions and , to generate two probability vectors and according to the above process such that after the normalization, and have the desire properties: is big (every is at least the threshold ), and is -far from any big distribution ( contains a significant number of entries with ). Then, we generate two histograms and according to and respectively. If the histograms and are hard to distinguish, then we can establish a lower bound for the bigness property. This requirement will show up as constraints for designing two prior distributions, and , to achieve these families of distributions with high probability. Below, we summarize the conditions that we need the prior distributions to hold (with high probability):
The probability vectors and are approximate probability distributions; that is, all of their coordinates are non-negative and their total probability masses are each close to one. 2. 2.
After scaling the probability vectors and above into respective probability distributions, the normalization of is -big, and the normalization of is -far from any -big distribution. 3. 3.
The total numbers of (Poissonized) samples in and drawn from the normalization of and are each . 4. 4.
Given or , distinguishing whether it is generated from or with success probability requires or to contain a large number of samples. 5. 5.
Additionally, we will bound the largest probability mass that the normalized distributions place on any domain element – this part is not necessary for this section, but will be useful for the reduction between monotonicity testing and bigness testing later on.
We state this result as the following theorem.
\bignessLB
Proof.
Let positive values , , , and a positive integer be a set of parameters with the following property that we determine more precisely later:
[TABLE]
Throughout this section, we consider the bigness threshold , and note that the value itself may depend on the error parameter , an the number of matched moments . Note also that is a constant.
We propose the following optimization problem, , such that its optimal solution, specifying and satisfies the requirements of the theorem. Recall that and are generated by drawing i.i.d samples, ’s and ’s, from and respectively:
[TABLE]
Intuitively speaking, as aims to generate -big distributions, we must ensure that the ’s are bounded away from , so that has expected value higher than . At the same time, we hope to maximize the probability that so that is far from any -big distribution, under the constraint that the first moments of and are exactly matched, as to ensure that the resulting distributions of histograms and are statistically close. The objective value of this optimization problem corresponds to the expected distance of to the closest -big distribution in total variation distance. To clarify the notation, and are given to us. The unknown variables in are the PDFs and of two random variables and , respectively, as well as the scaling variable . The parameter roughly specifies the ratio between the largest and the smallest non-zero probabilities that and can take.111 Note that and are on a continuous domain. However, will additionally have a non-negligible probability mass placed at value [math]. In fact, it turns out that in the optimal solution, and are only supported on a few distinct values ( of them), so the optimal and assume the role of probability mass functions rather than PDFs.
[TABLE]
In the following lemma, we find the optimal value of . We use to refer to the optimal value of optimization problem .
Lemma 4.1**.**
For any and such that , there exists a scaling parameter, , in such that
[TABLE]
The proof of Lemma 4.1 is postponed to Section 4.1.
Let the value of be determined by the above lemma, and set to be .
Recall our wish list of five properties for the priors, and , that we propose in the introduction of Section 4. We define the following “good” events , which hold with high probability, to formalize the properties of the generated vectors and .
[TABLE]
and
[TABLE]
where is the number of elements such that is zero. Roughly speaking, these events state that and , generated in step 1, are approximate probability distributions (having total masses in ), and step 2 generates sufficient numbers of samples in the histogram (at least each). Further, consists of as many as elements with probability mass [math], thus is at distance at least from any -big distribution – we will set to reach the desired result.
In the following lemma, we show that conditioning on and , after running the process using the priors and , the generated histogram is a sufficiently large set of samples from a -big distribution, and histogram is a sufficiently large set of samples from a distribution which is -far from any -big distribution. In addition, the total variation distance between the distribution over ’s and ’s is bounded when , form a solution of . More precisely, let denote the distribution over histograms generated by the process when the prior is , and let be the distribution over histograms conditioning on . We define and similarly. In the following lemma, we bound the total variation distance between and as well.
Lemma 4.2**.**
Let , and form a solution of with objective value . Suppose and are the prior distributions to generate histograms and according to the process. Then, given event is a histogram of a set of at least samples from a -big distribution, whereas given is a histogram of a set of at least samples that are drawn from a distribution which is -far from any -big distribution. Moreover,
[TABLE]
Lastly, the largest probability mass among any elements in any probability distributions (from which the samples are drawn) is .
The proof of Lemma 4.2 is given in Section 4.2.
Now, we assign the parameters, , and , as follows:
[TABLE]
Recall that we set to be the optimal value of , and Lemma 4.1 tells us its value. We show that in this setting is at least . Let be . Then, we have:
[TABLE]
as long as . It is not hard to see that, for sufficiently large and for sufficiently large constant , then holds, yielding ,
for every , where is a constant.
Let and be and respectively. By Lemma 4.2, the total variation distance between and is at most , while and behave according to the claimed respective asymptotic bounds. Hence, the proof is complete. ∎
An important case of Theorem 3.1 is when , where we establish a near-linear sample complexity lower bound of for the general problem of bigness testing as follows.
\bignessTest
Proof.
By Theorem 3.1, there exist and with the aforementioned properties. Any -bigness tester has to distinguish between and with probability at least 2/3. On the other hand, the total variation distance between and is at most 0.01. Therefore, no algorithm can distinguish between them while receiving samples with probability more than . Therefore, testing -bigness requires samples.
Note that in the proof of Theorem 3.1, is determined by Lemma 4.1, and it is bounded by . Thus, if is a constant then is also a constant. Thus, the required sample complexity becomes . ∎
4.1 Proof of Lemma 4.1
See 4.1
Proof.
To prove the lemma, we introduce an auxiliary linear program () that is known to have an optimal value of the right hand side of the above equation. We prove the has the same optimal objective value as to prove the lemma. For two given parameters and , we define the following LP over two random variables .
[TABLE]
To interpret this LP, assume the unknown variable is the ’s of the random variables and . Thus, for any number in , we want to find and . Note that this optimization problem is linear since all the expectations above are a linear function of and . Moreover, there is an implicit constraint here that the integral of and should be one since they are probability distributions.
Observe that there exists a trivial solution where and are two identically-distributed random variables, so is feasible and its optimal objective value is at least zero. Let and be a pair of random variables forming an optimal solution for , and let . Since all and are in , then is also in . On the other hand, since is positive and bounded, then and thus ; hence is at most .
Now, we argue that and have the same optimal value. We introduce two new random variables and with the following PDFs, and later we show they form an optimal solution for .
[TABLE]
In the above equations, with a slight abuse of notation we say that is zero for ; that is, the probability mass for is given by the respective second terms. Since is defined to be , the second term in is zero for all in particular for . We define our notation in this fashion in order to make the calculations for and analogous, so we may write our proof compactly.
Now, we show that the proposed variables , and form a feasible solution for . First, we show that the domain of and are as stated in the definition of in Equation 1. Then, we show and are probability distribution, and we prove the constraints of hold as well.
First, consider the domain of the random variables. Clearly the domain does not include the numbers where the PDF is zero, so we prove that the and are (potentially) non-zero only when when and are in the rage specified by the domain constraints of the . Recall that the second term in is always zero. Thus, could be potentially non-zero only if equal to has a non-zero error probability according to . Therefore, is always in . For , in addition to the value , could be zero as well since the second term in the definition of may be non-zero at . Thus, is always in .
In addition, (and similarly ) is a probability distribution since the integral of the PDF is one:
[TABLE]
where the second equality is derived by substituting with .
Now, we focus on the constraints of . The first constraint is . Below we show that the expected value of is .
[TABLE]
One can similarly show that , and the constraint holds.
The second constraint is that the first moments of and are matched: for in . The previous constraint implies that the first moments, and , are equal, so here we focus on the second and higher moments. Fix in . For the -th moment of , we have:
[TABLE]
We can similarly show the same condition for . Since and satisfies the moment matching constraints of , we derive the moment matching constraints of as follows:
[TABLE]
Therefore, , and form a feasible solution for . Thus, the objective function according to , is at most the optimal value of :
[TABLE]
On the other hand, the objective value of and are the same on the two solutions we discussed:
[TABLE]
where the last equality is true, since we chose and to be the optimal solution of at the beginning.
[TABLE]
We continue the proof by showing that the above inequality is true in the other direction, i.e, is at most . Let , and form a feasible solution for . We define random variables and with the following PDFs, and show that they form a feasible solution for in Equation 2 with the same objective value as and in the :
[TABLE]
First, we show that the domain of and matches with the domain constraint in . Similar to the previous part, we prove that the PDF’s are zero outside the interval specified by the domain constraint . Observe that is non-zero if and only if and are both non-zero, so has to be in . Thus, the domain of the random variable (and similarly ) is .
Moreover, note that (and similarly ) is a probability distribution:
[TABLE]
where the equation is derived by replacing with a new variable . Now, we show that the constraints of are satisfied for and . Fix . We show the -th moment of and are equal:
[TABLE]
Similarly, one can show is equal to . Since the pair and satisfies the moment matching constraints of , then is equal to . Therefore, is equal to .
Now, we focus on the objective functions of the and . We have:
[TABLE]
Now that for any feasible solution of , there exists a feasible solution for with the same objective value, one can conclude that the optimal value of , , is at most . Thus by Equation 3, we have:
[TABLE]
which implies that , and also form an optimal solution for , and hence and are equal. This also implies that is at most .
In Appendix E of [WY16b], Wu and Yang proved that an optimal solution of can be obtained through the best polynomial approximation of the function . More formally, they showed that there exists a solution for with the following optimal value:
[TABLE]
where is the set of all degree polynomials. The optimal polynomial approximation error have been studied in [KVZ12] and in Sec. 2.11.1 of [Tim63]. They computed the maximum error of the best degree polynomial approximation. More precisely, we have:
[TABLE]
Hence, the proof is complete. ∎
4.2 Proof of Lemma 4.2
Before stating the lemma, we review the definitions we used so far. Recall that and are generated by drawing i.i.d samples, ’s and ’s, from and respectively:
[TABLE]
and and where the desired events:
[TABLE]
and
[TABLE]
where was the number of elements for which is zero. We generate histograms and according to and respectively. let denote the distribution over histograms generated by the process when the prior is , and let be the distribution over histograms conditioning on . We define and similarly. In the following lemma, we prove “good properties” for and after normalization and also bound the total variation distance between and .
See 4.2
Proof.
First, we show given event , the normalization of is -big distribution. From , we know that the ’s are in , and the ’s are in . Observe that after normalization is at least the following:
[TABLE]
where the last inequality is due to the fact that is at most . Thus, the normalization of is -big. On the other hand, we can achieve the same lower bound for the normalized value of when is not zero, so the normalization of places either probability mass zero, or at least , on each element. Similarly, the maximum probability mass among the normalization of ’s and ’s is at most
[TABLE]
because , yielding the desired bound on the maximum probability mass.
Next, we show that given , the normalization is -far from any big distribution. Note that if is zero, then probability even after normalization remains zero. So, there are exactly elements that have probability mass zero and the rest (based on above argument) each have probability mass at least . Thus, the total variation distance to -bigness is at least , and given it is at least .
Now, we show the distance between and is bounded. By the triangle inequality we have:
[TABLE]
where the superscript for the events, and indicates the complimentary event. Now, we start with bounding the probability of the complementary events of and from above to show that they happen with small probability. Since the ’s (and similarly the ’s) are independently drawn from with expected value , and they are in the range , then by the Chebyshev inequality, we have:
[TABLE]
Recall that the was the optimal value of . Thus, is . Moreover, , the number of the ’s that are zero, is a Binomial random variable with which is . Thus, by the Chernoff bound, we have:
[TABLE]
Finally, we show that the total number of samples is high with high probability. Assume we already have is at least . Then the total number of samples is a Poisson random variable with mean . By the tail bound for Poisson distributions proved in [Can17]222If is a Poisson random variable with mean , then for any , we have , we have
[TABLE]
One can achieve a similar result for .
Now, we continue bounding the distance between and . (and similarly ) indicates the distribution over the -th coordinate of the histogram, . By the previous inequality, we have:
[TABLE]
where the last inequality follows from the fact that the first moments of and are matched, by Lemma 6 in [WY16a], we have:
[TABLE]
Hence, the proof is complete. ∎
5 From Bigness to Monotonicity
In this section, we show how to turn our lower bound results for bigness testing problem in the previous section, into lower bounds for monotonicity testing in some fundamental posets, namely the matching poset and the Boolean hypercube poset. See Section 3.2 for the proof overviews.
5.1 Monotonicity testing on a matching poset
Theorem 5.1**.**
Consider the pair of distributions , for the bigness problem as specified in Theorem 3.1 with bigness threshold , number of samples , and maximum probability . There exists a distribution on a matching of size with maximum probability such that testing, with success probability , whether a matching randomly drawn from such a distribution is monotone or -far from any monotone distribution, requires samples.
Proof.
Let and form the vertex set of a directed matching of size where the edges are ’s for . Consider the distribution over the matching poset ; more specifically, the distribution is monotone if and only if the probabilities for all . We apply the Poissonization technique, then prove our lower bound by contradiction: assume there exist an algorithm which tests monotonicity of distributions over the matching of size using samples where and successfully distinguishes whether the distribution is monotone or -far from monotone with probability at least 2/3. To reach the desired contradiction, we turn these samples into samples for the -bigness testing problem, and show that one can test -bigness using as a black-box tester. Note that , so the factor is in this proof.
Assume we have a distribution, , over elements for which we wish to test the bigness property. We construct a distribution over a matching over based on as follows:
[TABLE]
Clearly the maximum probability of is at most . Next we show the changes in distances to monotonicity. Next we show the difference in distance to monotonicity from the case that is -big and the case that is -far from -big. If is a -big distribution, then and thus is monotone.
Next, if is -far from any -big distribution, then we show that is -far from any monotone distribution. Let be the set of elements for which . Clearly, to make a -big distribution, one has to increase all the to for and there is no need to increase the probability of any other elements. Therefore, the total variation distance to of to is exactly assuming . Let be the closest monotone distribution to , and observe that . We compute:
[TABLE]
Finally we show that the assumed algorithm may be used to test the -bigness property of . Suppose we are given access to independent samples from the distribution for which we want to test -bigness property. We construct a distribution as described above: to obtain samples from , for each , we create and samples of and respectively. The samples for each of the ’s may be obtained by substituting each element from with in samples from , whereas samples for ’s may be generated directly by drawing ’s uniformly at random. Thus, using samples from , one can construct samples from and use for testing the monotonicity of the matching poset , which corresponds to testing the -bigness of , yielding a contradiction by the fact that bigness testing requires samples by Theorem 3.1. ∎
This result, applied with Theorem 3.1 using (where , and ), immediately yields the following lower bound for the testing monotonicity in a matching poset.
Corollary 5.2**.**
For sufficiently small parameter , any algorithm that can distinguish whether a distribution over a matching poset on vertices is monotone, or -far from any monotone distribution, with probability requires samples. Moreover, the maximum probability mass of the distribution in the lower bound construction can be bounded above by .
5.2 Monotonicity testing on a hypercube poset
Consider the Boolean hypercube poset with vertices. For convenience, let and denote the distribution of distributions implicitly constructed in the lower bound of Theorem 5.1, where distributions in are monotone, and distributions in are -far from any monotone distribution, respectively. Theorem 5.1 shows that randomly-drawn distributions from and generate statistically similar histograms over the matching poset. For simplicity, we do not distinguish the parameters , and in Theorem 3.1 and Theorem 5.1 as they are equivalent up to a constant factor.
5.2.1 General lower bound for monotonicity testing on a hypercube poset
We first establish the theorem that describes the result of the outlined embedding approach, then later apply this result to achieve interesting special cases.
Theorem 5.3**.**
Let an integer be a parameter. Suppose that there exists a pair of distribution of distributions over a matching on pairs of vertices, forming an instance for the monotonicity problem with distance , a maximum probability , and a lower bound of samples. Then, testing monotonicity on the Boolean hypercube of size with distance parameter requires samples, where and .
Proof.
Consider two consecutive levels and of a hypercube, where the level consists of vertices whose coordinates contain exactly ones. Our approach is to embed our matching onto these levels in the hypercube, so that each edge of the matching has one endpoint in each of the two levels, and each endpoint is mutually incomparable to any endpoint of any other edge.
We choose our coordinates for the embedding as follows. We pick all the vertices such that there are exactly ones among the first coordinates. Let denote the set of these vertices. There are exactly vertices in the set . Clearly, each vertex in is comparable with the vertex whose coordinate only differs at the last bit. Furthermore, it is incomparable with the rest of the vertices in , as other coordinates also have ones on the first bits.
Next we describe the probabilities assigned to each vertex on the hypercube, given , the distribution over a matching (drawn from or ). First we assign the probabilities to according to . Namely, the set of coordinates of with ones corresponds to and that with ones corresponds , where and are as defined in the previous proof. Then, for the remaining vertices in level and above, assign the probability of for a sufficiently large such that the quantity becomes at least . Let be the total probability assigned to all these vertices so far. We divide all assigned probabilities by to finally obtain a distribution over the hypercube. We denote the constructed distribution over the hypercube .
Clearly, the proposed construction preserves the monotonicity due to the incomparability between distinct embedded matching edges. In particular, if distribution over the matching is drawn from , the distribution over the hypercube will still be monotone; if it is drawn from , then the distance to monotonicity is now since, at the very least, the subposet restricted to the embedded matching must be modified to a monotone distribution over this matching.
Using Corollary 5.2, any algorithm that can test the monotonicity of requires samples from the matching vertices. Note that if we draw a sample from with probability it is from the matching. Therefore, observe that samples from the matching are required in order to obtain samples from the hypercube with high probability. This yields the lower bound of samples for testing monotonicity over the hypercube poset. ∎
5.2.2 Applications of Theorem 5.3
We extend Theorem 5.3 into two following corollaries. Firstly, we consider embedding our matching to the largest possible levels of the hypercube, namely the middle ones, showing the lower bound of samples for (Corollary 5.4). To complement this first corollary that only handles sub-constant , we secondly apply our construction to higher levels of the hypercube, and readjust the construction from Theorem 3.1 so that moments are matched (as opposed to ). This approach shows the lower bound of for testing monotonicity on the hypercube poset with distance parameter , such that as (Corollary 5.5).
Corollary 5.4**.**
For sufficiently small , any algorithm that can distinguish whether a distribution over a Boolean hypercube poset of size is monotone, or -far from any monotone distribution, with success probability requires samples.
Proof.
Let be . As we stated in the proof of Theorem 5.3, we embed a matching of size onto the middle layer of the hypercube where is at least by Stirling’s approximation. We have
[TABLE]
Applying Theorem 5.3, we achieve our lower bound of for by choosing a sufficiently small constant . ∎
Corollary 5.5**.**
Any algorithm that can distinguish whether a distribution over a Boolean hypercube poset of size is monotone, or -far from any monotone distribution, with success probability requires samples, where and are constants. In particular, as .
Proof.
Without loss of generality assume is even. Otherwise, observe that when is odd, we may embed a hypercube of size in a hypercube of size and achieve the same lower bound up to a constant factor. Consider . Observe that
[TABLE]
This yields the inequality
[TABLE]
We pick for some constant so that . The embedded matching is of size .
Next, consider the application of Theorem 5.1 leveraging Theorem 3.1 with constant parameters and , yielding the lower bound of samples for . We compute . Applying Theorem 5.3, we achieve the lower bound of for testing monotonicity over the hypercube with .
Recall that . Using a similar argument as above, we can also bound
[TABLE]
establishing the lower bound of for testing monotonicity over the hypercube poset, where . Since , for sufficiently large , we may choose sufficiently small and large , so that , as desired. ∎
6 Reduction from General Posets to Bipartite Graphs
In this section, we show that the problem of monotonicity testing of distributions over the bipartite posets is essentially the “hardest” case of monotonicity testing in general poset domains. That is, we show that for any distribution over some poset domain of size , represented as a directed graph , there exists a distribution over a bipartite poset of size such that (1) preserves the total variation distance of to monotonicity up to a small multiplicative constant factor, and (2) each sample for can be generated using one sample drawn from . These properties together imply the following main theorem of this section.
\generaltobipartite
Proof.
Consider an arbitrary poset described as a directed graph , and an associated probability distribution over . We construct a bipartite graph based on the transitive closure of , denoted by , and a distribution over such that testing the monotonicity of over is roughly equivalent to testing the monotonicity of over .
The construction of the bipartite is as follows: for each , we add two vertices and to , so that and together form the bipartition . Think of and as the set of top and bottom vertices respectively. Next, consider two vertices and such that there is a path from to in (i.e., is an edge in ). For every such pair, we add the directed edge to . Given the distribution over , we set . Observe that we can generate a sample from using a sample from : if is drawn from , a sample for is obtained by picking either or , each with probability .
Now, we prove that testing monotonicity of is equivalent to testing monotonicity of . If is monotone, then is also monotone: for each , via the transitivity of monotonicity of along the - path on . So, .
Next, suppose is -far from . By Lemma 6.1 (shown below), there exists a (directed) matching in , such that
[TABLE]
Then, the set of edges ’s corresponding to also forms a matching, , on . Let be the monotone distribution on closest to . Since is a monotone distribution, for an edge , is at least . Then, by the triangle inequality, we obtain:
[TABLE]
Note that the second to last inequality is true since is monotone, and has to be at least . Therefore, if is -far from monotone, then is -far from monotone.
Thus, to distinguish whether is monotone or -far from any monotone distribution on , it is suffices to test if is monotone or -far from any monotone distribution on the bipartite poset . ∎
An interesting byproduct of Equation 4 is the following: If you consider the violation of each edge from monotonicity to be the weight of that edge, then the weight of the maximum weighted matching is the distance of the distribution to monotonicity. We formally explained it in the following theorem.
\distToMonMatching
Proof.
Let indicates the weight of the maximum weighted matching. Fix a matching of edges . Assume is the closest monotone distribution to , so for every edge . One can show the following:
[TABLE]
where the last inequality is true, because the above is true for any matching . On the other hand by Lemma 6.1, there exists a (directed) matching in , such that
[TABLE]
Thus, the proof is complete. ∎
6.1 Proof of auxiliary lemmas
Lemma 6.1**.**
Let be a probability distribution over the vertex set of an unweighted directed graph representing a poset. Then, there exists a matching on the transitive closure such that
[TABLE]
Proof.
Define to be the -distance of to monotonicity. We need to show the following:
[TABLE]
Let be the monotone function on closest to (in the -distance). Let denote : the -distance between and . Note that is not necessarily a probability distribution which implies that can be smaller than . To prove the above inequality, we will use as an intermediate variable which is in between the left hand side and the right hand side of the above inequality. Specifically, it suffices to prove the following:
- (i)
; 2. (ii)
there exists a matching on the transitive closure of such that .****
Proof of Item (i): To show that is at least , we prove that the monotone distribution , obtained by normalizing , is at most -far from . Since any monotone distribution is at least -far from in -distance , we will have , establishing the desired claim.
First, note that if is zero for all , then by definition is at least :
[TABLE]
where the inequality holds since the -distance between two distributions is always at most , so is as well. Hence, assume is not a zero function for the rest of the proof.
Also, note that is a non-negative function. We prove the non-negativity of by contradiction: assume is negative for some . Consider a non-negative function . It is not hard to see that is monotone due to monotonicity of . For every for which , we have
[TABLE]
Since everywhere else, when contains some negative entry. This contradicts the fact that was the closest monotone function to , hence has to be non-negative for all ’s.
Consider ; it follows that is a well-defined monotone distribution. Then,
[TABLE]
Thus, Item (i) is proved.
Proof of Item (ii): We leverage the duality theorem in linear programming. We write an LP that optimizes over all monotone functions ’s to find the function closest to under the -distance. Let be the variable that indicates the amount of perturbation at vertex that is needed to make monotone. For an edge , the monotonicity constraint requires that is at least , or equivalently,
[TABLE]
Given this inequality, we can find the monotone function closest to by solving the following linear program:
[TABLE]
We denote the optimal solution for by , and the corresponding optimal value of the objective function by .
To obtain the dual of , we write down its standard form by substituting by as follows:
[TABLE]
Then has the following dual:
[TABLE]
By strong duality, the optimal value of is equal to the optimal value of , namely . On the other hand, the optimal solution of can help us to find a matching that satisfies the property in Item ii. Constraints of can be viewed in the form of and . Since is a totally unimodular matrix by Lemma 6.2 (proved below), the LP admits an optimal solution that is also integral.
Let denote an integral optimal solution of the , and let be a multi-set of the edges, containing copies of edge . Define the weight of each edge as , and let the weight of a set be the sum of the weight of the edges in . Thus:
[TABLE]
We construct a matching where , which completes the proof of Item ii. Based on the constraints of the , forms a subgraph on (but plausibly with multi-edges) such that the absolute difference between the number of incoming edges and outgoing edges at each vertex is at most one. Hence, we can decompose to paths and cycles.
Consider a path . Observe that the weight of a path only depends on its endpoints:
[TABLE]
Remark that the edge does not necessarily belong to , but since and are endpoints of a path , then is contained in the transitive closure of .
By the above equation, if we replace the edges of in by a single edge , then remains unchanged. We can also remove all cycles without changing since the weight of a cycle is always zero. Lastly, we may also join paths so that their endpoints are all distinct (since the difference between the in-degree and the out-degree of any vertex is at most one). After this process, we eventually obtain a matching on the transitive closure of such that
[TABLE]
concluding the proof of Item (ii) and this lemma. ∎
Lemma 6.2**.**
The matrix , namely the coefficient matrix of when the constraints are written in the form and , is a totally unimodular matrix.
Proof.
We arrange the rows of so that the two constraints of each vertex occupy two consecutive rows and for , and that each column corresponds to the edge for . Then, each entry of can be described as follows:
[TABLE]
To prove that is a totally unimodular matrix, we make use of the following theorem.
Theorem 6.3** (Ghouila-Houri Characterization [GH62]).**
An integral matrix is a totally unimodular matrix if and only if, for any non-empty subset of rows, namely , there exists a disjoint partition of into and , such that the following is true.
[TABLE]
Here, for each non-empty subset , we explicitly define and according to the following three conditions. (1) If both and are in , put both of them in . (2) If only is in , then put in . (3) If only is in , then put in .
Consider column corresponding to . This column has four non-zero entries:
[TABLE]
If both and appear in , or both of them are not in , clearly Equation 5 holds (similarly for and ). Thus, assume that exactly one of two rows and , and exactly one of the two rows and , are in . It is not hard to see that if the corresponding entries ’s in these rows have the same sign, then one row ends up in and the other row ends up in . If the entries have different signs, then both rows end up in the same set or . In both of these cases, the sum in Equation 5 becomes zero. Hence, the proof is complete. ∎
7 Algorithms with Sublinear Sample Complexity
In this section, we provide sublinear sample complexity algorithms for testing bigness, and testing monotonicity of distributions over different poset domains. See Section 3.4 for proof overviews.
7.1 An Algorithm for Bigness Testing
We give an algorithm for the bigness testing problem that requires a sublinear number of samples. For testing bigness, all the domain elements must be at least a threshold . The high level idea is to learn the histogram of the distribution use a result from [VV17]. Then given the histogram, if the weight of the elements that are below the threshold is less than , then we can accept the distribution, otherwise we reject.
First, we define the histogram of a distribution.
Definition 7.1**.**
For a distribution , we define to be the histogram of if and only if for all , is the number of domain element such that is equal to .
Let be a permutation of the domain elements. We define to be the permutation of according to such that for all domain element , is equal to . Based on the definition, it is not hard to see permutation does not change the number of domain element with a certain probability, so and are the same. Hence, when we learn the histogram of , we can claim that we learn up to a permutation.
For learning, we will use a result from [VV17] for learning discrete distributions, up to a permutation of the domain elements. In Theorem 1.11 of [VV17], combined with Fact 1 of [VV16], authors provided the following theorem:
Theorem 7.2** ([VV17, VV16]).**
There exists an algorithm that, given i.i.d. samples from an unknown distribution , outputs an explicit description of a distribution, namely , such that there exists a permutation where with success probability .
This theorem implies the following upper bound for bigness testing.
Corollary 7.3**.**
For bigness threshold , there exists an algorithm that distinguishes whether a distribution is -big or -far from -big with success probability using i.i.d. samples from .
Proof.
We refer to Algorithm 1 for the outline of our procedure. Let denote the distribution outputted by the “learner” as promised by Theorem 7.2 with distance parameter . Let be the permutation guaranteed by Theorem 7.2. We define be the distribution obtained by permuting the elements of according to the associated permutation such that for each domain element , let . Hence, with probability at least 2/3, is at most . Note that is not known to the algorithm, but used for the analysis.
Now, we have the following two cases: If is -big, then
[TABLE]
On the other hand, if is -far from -big, then
[TABLE]
That is, offers us a condition for -bigness testing by simply measuring its distance to -bigness (the if condition of Algorithm 1). Therefore, Algorithm 1 outputs the correct answer with probability at least 2/3. Note that learning using parameter does not change the asymptotic sample complexity, so the proof is complete. ∎
7.2 An Algorithm for Testing Monotonicity on Matchings
We give a sublinear time algorithm for testing monotonicity on matchings. Similar to the previous section, we use a result from [VV17] for learning the distribution histogram of a pair of distributions. First we employ the following definitions (see also Definition 5.2 and Definition 5.4 of [VV17]). A distribution histogram of a pair of distributions is a function that counts the number of elements with a given probability mass in the distribution and in the distribution . More formally, we have the following definition:
Definition 7.4** ([VV17]).**
For a pair of distributions and , we say is the distribution histogram of and if and only if for any in the domain: .
We will use this two-dimensional histogram to indicate a histogram of a distribution over a matching of size : Let and be the two distributions that imposes on the top and the bottom vertices in the matching respectively. Without loss of generality assume the edges in the matching connects the -th vertex in the bottom to the -th vertex in the top. Note that counts the number of domain elements such that and . Hence, is the number of matched pairs of vertices with at least one non-zero probability vertex. Since the sum of probabilities according to is one, we have . This is similarly true for : .
Now, we define the distance between two histograms of two distributions: and . At a high level, the distance between two histograms is the minimum cost one needs to pay to “transform” to . In particular, we transform one histogram to another by moving mass from one point to another: By moving mass from to , we obtain another histogram , such that , and for all other points in , and are equal. The cost of this move is . More formally, we have the following definition.
Definition 7.5** ([VV17]).**
For a pair of functions , we define the distance notation as the minimum cost over all mass moving schemes with finitely many steps for turning into , where the cost for moving value from point to is . Note that we assume that , where extra value at point on or may be added to ensure this equality.
Let be the permuted distribution of according to the permutation of such that for each domain element , . Note that as long as we permute and with the same permutation, the distribution histogram and are the same. Moreover, given one can construct and such that there exists a permutation for which and are the permuted versions of and according to .
We relate the distance to the total variation distance in the following Lemma. In particular, the distance between two distribution histograms , defined according to two pairs of distributions , upper bounds the -distance up to a permutation of the labels of the domain elements.
Lemma 7.6**.**
Let functions , be defined according to two pairs of probability vectors , . There exists a permutation of such that
[TABLE]
Proof.
According to the definition of the distance, , there exists a moving scheme consisting of a sequence of steps, denoted by (with ), describing the changes that eventually turn into for which we move the mass of from the source to sink at step . We claim that if the scheme has minimum cost, , without loss of generally, we may make the following assumptions about the scheme: (1) There are no two steps and such that is the same as . (2) All the ’s are positive integers.
To see why (1) is true, assume otherwise; if , then means that the source and the sink in step is the same, so no mass is actually moved. Hence, we can just remove this step without changing the scheme. if , then means that mass of quantity is first moved from to , and then moved from to . Clearly, one can move the same quantity of mass from to directly with no larger cost, making one of the steps or vacuous.
Given (1), we now show that (2) also holds: Note that given (1), each point may appear in the steps as either a source or a sink, but not both. Moreover, the order of the steps does not matter, since the source always has the capacity for providing the mass. If there are several steps that move mass between the same source and the same sink, one can replace all of them with one step moving the total quantity of mass moved between them. Now, we can assume between each source and each sink there is a well defined quantity indicating how much mass we moved from the source to sink. This fact helps us to form a graph where the vertices are the sources and the sinks which appeared in the scheme. We put a directed edge from a source to a sink if we moved a non-integer mass from the source to the sink. We assign a weight to the edge which is the fractional part of the mass we moved from the source to the sink. We propose the following process for changing the steps for which each change removes at least one edge from the graph. We keep repeating the process until no edge remains to assure that all ’s are integers.
Remove sources or sinks with no edge. Clearly, the graph is bipartite, and all the edges are from sources to sinks. Since and are integer, the final mass at each source and sink will eventually be an integer. Hence, each source has an out-degree of at least two and each sink has an in-degree of at least two. Therefore, the graph has an undirected cycle with an even length. Let and be the sets of the sources and the sinks involved in the cycle respectively. Let and be a partition of the edges in the cycle such that every other edge is in the same set. Clearly, each source (and sink) has exactly one edge in and one edge in . As we define before the cost of moving one unit of mass via an edge from to is . We define the cost of (and ) to be the total cost of edges in (and ). Without loss of generality assume cost of is not greater than the cost of . Let be the minimum weight of edges in . We modify the steps such that each step with a corresponding edge is moves less mass, and each steps with a corresponding edge in moves more mass. Clearly, this process does not increase the total cost of the scheme. However, it makes the fractional part of at least one step equal to zero. We repeat this process until no such step exists which concludes the proof for claiming (2).
Let be the series of the distribution histograms which is generated during the mass moving scheme after each step. is the distribution histogram we start with, , and is the final distribution histogram . Now, we create a sequence of pairs of vectors such that (under the same definition of distribution histogram, relaxed to allow non-distributions ). We start off with and being and . Given , we obtain as follows.
Consider step described as with an integer . Inductively, assume which implies that and contain at least entries with and . To apply step , we pick an arbitrary set of many such entries, then modify the entries and from and to and respectively for each . That is, and for , and and for . Hence, the -distance incurred by step becomes:
[TABLE]
By summing over all steps, and applying the triangle inequality, we have:
[TABLE]
Now it remains to show that there exists a permutation that maps the labels of the given distribution to our constructed vectors ; namely, and . Indeed, is the distribution histogram that counts the number of indices with and , so implies that for every , there are also equally many indices with and . Hence, there exists a bijection between their indices that maps ’s to ’s and vice versa, concluding the lemma. ∎
Next, we state the the result of [VV17] to learn the distribution histogram of a pair of distributions.
Theorem 7.7** (Theorem 5.6 of [VV17]).**
There exists an algorithm that, given i.i.d. samples each from a pair of unknown distributions and , outputs a function such that with success probability .
We now prove the upper bound for the monotonicity testing problem over the matching poset.
Theorem 7.8**.**
For sufficiently small positive constant , there exists an algorithm that distinguishes whether a distribution over the vertex set of a directed matching on vertices is monotone or -far from monotone with success probability using i.i.d. samples from .
Proof.
For clarity, denote the edge set of the graph with the set of edges , and the set of vertices where and . For a distribution over , let and denote the probability mass places on elements of and ; note that and are functions on domain and , but generally not probability distributions.
The outline of our algorithm is given as Procedure in Algorithm 2. In our algorithm, we hope to invoke Theorem 7.7 by considering the (normalized) and as our and , respectively. However, Theorem 7.7 requires roughly the same number of samples from both and , while and may have vastly different total probability masses; for instance, it may be costly to try to obtain many samples from .
Before we proceed, by Theorem 3.3, it is straightforward to see:
[TABLE]
In order to make the probability of the top and the bottom vertices at least a constant, we define an auxiliary probability distribution obtained by averaging with a monotone distribution: where . Clearly, if is monotone, then is monotone too. Also, if is -far from monotone, then observe that the distance of to monotone is
[TABLE]
which preserves the distance to monotone to a factor of . We can generate samples for using asymptotically the same number of samples from : A sample from is obtained by drawing a sample from or drawing a uniform random vertex with probability each (Procedure in Algorithm 2); henceforth, we consider the problem of testing for monotonicity with distance instead.
The main benefit for considering the monotonicity testing on instead of is that the total amount of probability masses placed on and on are at least each. Hence, it takes samples from according to the procedure above to obtain at least samples from each of and with good constant probability; that is, we can create our input for the algorithm in Theorem 7.7 using i.i.d. samples from .
Denote by the total probability masses that places on and , respectively. Let and be the probability function that assigns to vertices of and , respectively. Let and be the distributions over and that are obtained by normalizing and (separately). More precisely, we have
[TABLE]
Let (to be determined exactly later). Invoking Theorem 7.7 with this parameter, we obtain a function where using samples from .
Next, we rescale each dimension of back by and , thereby obtaining our estimate of . If we knew and exactly, we would define , and we would have . However, we can only estimate and up to an additive error with high constant probability using samples. To this end, let be the estimate of , and let . We define for which . Below, we show that is a good estimation of .
Recall that . By definition, there exists a minimum-cost sequence of steps for turning to :
[TABLE]
Observe that under the cost function in Definition 7.5, we may assume without loss of generality that there are no such that in the moving scheme. Namely, we can instead “shortcut” this scheme by moving the value from to directly without leaving any extra amount at (during step ) to pick up later (during step ). In this moving scheme, the value of on any must be non-increasing or non-decreasing throughout the steps (since values are only being moved in, or only being moved out, but not a mixture of both). In particular, this condition implies that the total value of ’s moving into never exceeds the value of . More formally,
[TABLE]
Now, we are ready to bound . By definition, we have and . Thus, any moving scheme that turns into , will also turn into . Hence, we can use the same sequence (up to scaling) for moving the mass from to to show a bound for : at step , we move the value from to . We establish our bound as follows.
[TABLE]
Going back to our algorithm, we compute : the function minimizing that is also defined according to an actual monotone probability distribution over . Observe that if is monotone, then
[TABLE]
due to the optimality assumption above. On the other hand, if is -far from monotone, then by choosing ,
[TABLE]
for some permutation over , where and , making use of Lemma 7.6 above. Hence, provides us with a condition for testing monotonicity over the matching poset , as desired. ∎
7.3 An Algorithm for Testing Monotonicity on Bounded Degree Bipartite Graphs with Sub-linear Sample Complexity
We give an algorithm which tests monotonicity of a distribution on a bipartite poset with sample complexity where denotes an upper bound for the maximum degree over all vertices in . Given sample access to the distribution , we implement a sampling oracle for a certain distribution on a matching poset with vertices. This distribution is monotone on if is monotone on , and is -far from monotone on if is -far on . Hence, we apply the algorithm for testing monotonicity on the matching poset to test the monotonicity of , immediately obtaining the desired sample complexity. We describe the construction of and the distribution below and show the correctness of our approach in Theorem 7.10.
More formally, let be a distribution over a directed bipartite poset where and are the sets of the bottom and the top vertices, and is the set of edges. Let be an upper bound on the degree of .
The matching poset .
Based on , we create a matching over vertices according the following procedure. Similar to , is the set of bottom vertices, is the set of top vertices, and is the set of edges.
- •
Create copy vertices for each vertex .
- •
For each edge , match an unmatched pair of vertices via the copy edge ; place , and .
- •
For all remaining unmatched vertices , create a dummy vertex , then match it to via the dummy edge ; place , and . Note that the dummy vertex is always put in the bottom set.
Note that the second step above is always possible since there are at most edges incident to a vertex.
Distribution over .
The distribution over the poset is defined as follows. For each copy vertex , set . For each dummy vertex , set . One can generate a sample from , by drawing a sample in according to , and drawing uniformly at random from : The -th copy of , , is a sample drawn from .
In the following lemma, we show that the distance of to being monotone is closely related to the distance of to monotonicity.
Lemma 7.9**.**
Let and be two distributions over and as described above. If is monotone, then is monotone. If is -far from being monotone, then is -far from being monotone.
Proof.
Observe that for each copy edge , the probabilities at the endpoints are and , respectively. Thus, if is at most , then will remain at most . Furthermore, for each dummy edge , the probability of the bottom vertex, , is zero, so this edge never violates the monotonicity of . Hence it follows immediately that if is monotone on , then is monotone on as well.
On the other hand, assume is -far from being monotone. We define a weighted graph on the transitive closure of , , where the weight of each edge is . By the proof of Theorem 3.3, has a weighted matching, namely , of weight such that
[TABLE]
Since is a bipartite poset, and the edges are all from to , is the same as . Hence, each edge in is in as well. Also, by the construction of , there exists a copy edge that corresponds to . Let be the set of copy edge where is in . is a matching in as well.
Observe that by the above construction, the weight of is . Hence, contains a matching, , of weight which is at most the weight of the maximum matching in . Let be the weight of the maximum matching in . By Theorem 3.3 and Equation 6, we obtain:
[TABLE]
Thus, if is -far from being monotone, then is -far from monotone as well, concluding the lemma. ∎
Given the above lemma, it is sufficient to test monotonicity of with proximity parameter . See Algorithm 3 for the steps. Below, we show the correctness of the algorithm.
Corollary 7.10**.**
There exists an algorithm that tests whether a distribution over a bipartite poset of vertices and maximum degree , is monotone or -far from monotone with success probability , using i.i.d. samples from .
Proof.
Given Lemma 7.9, it suffices to test the monotonicity of with parameter . Using Theorem 7.8 and since is a matching of size , one can test monotonicity of with high probability using samples as desired. Therefore, the proof is complete. ∎
7.4 Testing monotonicity of distributions that are uniform on a subset of the domain
In this section, we give an algorithm for testing monotonicity on a specific yet broad class of instances. More specifically, suppose that we are given a directed bipartite graph , along with a probability distribution on the set . Note that all the directed edges go from a vertex in the “bottom” set , to a vertex in the “top” set . We additionally assume that all distributions which we sample from are uniform on a subset of whose size is known to the algorithm. That is, for every vertex either or , where is the support of the distribution .
We will show the following result:
Theorem 7.11**.**
*Let be a directed bipartite graph as described above and be a probability distribution on which is uniform on a subset of , namely . Given the size of , there exists an algorithm with sample complexity that can test, with success probability , whether is monotone on , or is -far from any monotone function on , *
At a high level, our tester works as follows: We draw an initial set of samples from . We define to be the set of vertices from the bottom, , that we see in the sample set. Then, we look at the set containing all out-neighbors of the vertices in . We show the following structural property of distributions that are -far from being monotone: in expectation, the constructed set contains endpoints of violating edges, so cannot be too small. Thus, if is much smaller than , we can immediately conclude that the distribution is close in total variation distance to some monotone distribution. However, if is sufficiently large in cardinality, we draw more samples in order to estimate the amount of probability mass on . Note that if is monotone, then we expect that all the elements in be in the support of the distribution, namely , so every single element of should have probability mass for the distribution to be monotone. The tester rejects if there is sufficient evidence that this is not the case. More specifically, the proposed tester is given in Algorithm 4.
Proof of Theorem 7.11: As given in the algorithm, let and denote the sample sizes of the two steps described earlier. We consider the following two cases.
Completeness case: Assume is a monotone distribution. Clearly, each sample we draw has a non-zero probability. Since we pick to be the neighbor set of the samples we draw, we know that every element in has a non-zero probability. By the uniformity assumption, this probability is . Thus, when we draw samples from the distribution we expect fraction of them fall into . So, the expected value of is . We defer the asymptotic complexity analysis of this case to the end of our proof.
Soundness case: Assume is -far from being a monotone distribution. Consider all the violating edges in for which is greater than . By Lemma 6.1, there exists a set of edges, namely , that form a matching, and we have:
[TABLE]
Note that without loss of generality one can assume only has violating edges, since removing non-violating edges only makes the left hand side larger. By the uniformity assumption for , is exactly . Thus, by the above inequality, we have is at least .
Since there are vertices in that belong to the matching, is a random variable distributed according to the binomial distribution , we have that
[TABLE]
Using Chebyshev’s inequality and the fact that is a binomial distribution, we have
[TABLE]
Thus, with high probability, contains at least endpoints in . Note that the neighbor set of contains the other endpoints of the edges in the matching . Thus, contains at least vertices of zero probability, which implies that the size of has to be at least . Hence, for sufficiency large , the probability that gets rejected due to the condition is negligible.
Consider the second set of samples we draw in the algorithm . Clearly, the size of is a binomial random variable drawn from . However, we show that fraction of the elements in have zero probability. Thus, is at most while in the completeness case it is . So, we only need to estimate the bias of a Bernoulli random variable up to an additive error of . By Hoeffding bound, we only need to draw samples to distinguish the two cases with high probability which implies:
[TABLE]
Thus, with high probability, we distinguish them correctly.
7.5 Upper bound via trying all matchings
In this section we present a simple upper bound for the problem of monotonicity testing on bipartite graphs. Let be the number of pairs of subsets of top and bottom elements respectively for which there exists a perfect matching between them. The algorithm is the following:
Theorem 7.12**.**
We can test whether a distribution over a bipartite graph with vertices is monotone or -far from any monotone distribution with success probability , using samples, where is the number of pairs of subsets of top and bottom elements respectively for which there exists a perfect matching between them. That is, samples for a worst case graph .
Proof.
Let and denote the probability mass of and respectively. Note that if we use samples, we can estimate and within an additive error of . Thus, we can estimate the difference of the two with error of with a constant probability. We can amplify the probability of the correctness, by repeating the estimation and taking the median of them. Therefore, for each pair of subsets, the probability that the algorithm fails to estimate the difference of and within an error of is at most . By union bound, we distinguish whether is at least or at most zero by comparing the with , with a constant success probability.
Now, if is -far from being monotone with respect to the graph , there exists a matching such that the total difference between the probabilities of the bottom and the top elements, is at least by Lemma 6.1. Thus, in one of the iteration, we will consider this matching, and output reject. Also, if is monotone with respect to the graph , there is no violating edge. Therefore, for each pair and , we have . Thus, in no iteration we output reject, and the distribution will be accepted at the end.
Lastly, since there are at most pairs of subsets where is the total number of top and bottom elements respectively, we conclude that the sample complexity is . ∎
Remark:
Note that in order to execute the above algorithm, it is not required to know the quantity in advance. We can instead draw more samples and update all our estimates at the same time to sufficiently reduce the error probability for each estimate for the union bound to work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[ACS 10] Michal Adamaszek, Artur Czumaj, and Christian Sohler. Testing monotone continuous distributions on high-dimensional real cubes. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010 , pages 56–65, 2010.
- 2[ADK 15] Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. Optimal testing for properties of distributions. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada , pages 3591–3599, 2015.
- 3[AJOS 13] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. A competitive test for uniformity of monotone distributions. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29 - May 1, 2013 , pages 57–65, 2013.
- 4[BB 16] Aleksandrs Belovs and Eric Blais. A polynomial lower bound for testing monotonicity. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC) , pages 1021–1032, 2016.
- 5[BCS 18] Hadley Black, Deeparnab Chakrabarty, and C. Seshadhri. A o ( d ) ⋅ ⋅ \cdot polylog n monotonicity tester for boolean functions over the hypergrid [ n ] d d {}^{\mbox{\emph{d}}} . In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 2133–2151, 2018.
- 6[BDKR 05] Tugkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. The complexity of approximating the entropy. SIAM J. Comput. , 35(1):132–150, 2005.
- 7[BFRV 10] Arnab Bhattacharyya, Eldar Fischer, Ronitt Rubinfeld, and Paul Valiant. Testing monotonicity of distributions over general partial orders. Electronic Colloquium on Computational Complexity (ECCC) , 17:27, 2010.
- 8[BKR 04] Tugkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing , STOC ’04, pages 381–390, New York, NY, USA, 2004. ACM.
