Testing Mixtures of Discrete Distributions
Maryam Aliakbarpour, Ravi Kumar, Ronitt Rubinfeld

TL;DR
This paper introduces a new noise model for distribution testing, where the noisy distribution is a known mixture of the original and noise, and demonstrates that testing in this setting can be as sample-efficient as in the noise-free case.
Contribution
The authors propose a tractable mixture noise model for distribution testing and show that testing complexity remains unchanged compared to classical methods.
Findings
Sample complexity matches classical non-mixture testing
Mixture testing is more tractable under the proposed noise model
Results apply to identity and closeness testing problems
Abstract
There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size , distinguishing the uniform distribution from distributions that are far from uniform in -distance uses only samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is -close to uniform from the case where the distribution is -far from uniform. The latter task requires nearly linear in samples [Valiant 2008, Valian and Valiant 2011]. In this work, we present a noise model that on one hand is more tractable for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Machine Learning and Algorithms · Cryptography and Data Security
Testing Mixtures of Discrete Distributions
Maryam Aliakbarpour
CSAIL, MIT
[email protected] MA is supported by funds from the MIT-IBM Watson AI Lab (Agreement No. W1771646), the NSF grants IIS-1741137, and CCF-1733808.
Ravi Kumar
Ronitt Rubinfeld
CSAIL, MIT, TAU
[email protected] RR is supported by funds from the MIT-IBM Watson AI Lab (Agreement No. W1771646) the NSF grants CCF-1650733, CCF-1733808, IIS-1741137 and CCF-1740751.
Abstract
There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size , distinguishing the uniform distribution from distributions that are far from uniform in -distance uses only samples.
However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is -close to uniform from the case where the distribution is -far from uniform. The latter task requires nearly linear in samples [Val08, VV17b].
In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known a priori. Focusing on the identity and closeness testing problems leads to the following mixture testing question: Given samples of distributions , can we test if is a mixture of and ? We consider this general question in various scenarios that differ in terms of how the tester can access the distributions, and show that indeed this problem is more tractable. Our results show that the sample complexity of our testers are exactly the same as for the classical non-mixture case.
1 Introduction
Distribution testing [BFR*+*13] has been studied extensively for the past many years (see [Can15] for a survey). In the vanilla version, the problem is to quickly test if a discrete distribution has a certain property or is statistically far from any distribution with that property. The tester has access to samples from the distribution and strives to be as frugal as possible in the number of samples it uses. Many statistical properties, including various distances between distributions, are well understood in this model. There have been several relaxations to the basic testing model including tolerant testing (where the tester should also accept if the distribution is close to having a property), the conditional samples model (where the tester can access the distribution conditioned on a specified subset), making stylized assumptions about the distribution (monotone, sparse support, high-dimensional, etc), and so on. In each of these works, the aim has been to push the boundaries of our understanding: when do sample-efficient testers exist? Here, by sample-efficient, we mean the number of samples should be sub-linear in the domain size.
There are many scenarios in which a distribution is observed along with noise; in some cases, even the form of the noise is known a priori. One such scenario is the so-called identity testing problem in which the tester has a known (explicitly specified) distribution and its goal is to check if a given distribution, available as samples, is close to the known distribution. For example, assume that the distribution of the top million queries to a web search engine is known in advance. Then, identity testing would be a quick way to check how close the daily query distribution is to this known distribution. However, in reality, there are natural minor variations to the daily query distribution, which may cause the identity tester to fail. This is clearly undesirable.
An option to tackle the noise would be to use testers that are tolerant to noise. Unfortunately, even simple versions of tolerant testers are faced with near-linear lower bounds on the sample complexity, making this option uninteresting. For example, one can distinguish if a distribution on a domain of size is uniform or far from uniform in -distance using samples [Pan08]. However, an algorithm that distinguishes between near-uniform distributions and distributions that are far from uniform requires samples [Val08, VV17a]. Hence, to achieve sub-linear sample complexity, we need more judicious, stylized assumptions about the noise—how it is available to the tester and if it is adversarial.
A different yet natural way to model the above scenario is to view it as a mixture of distributions. In the above example, one of the components of the mixture can be interpreted as the signal and the other component can be thought of as the noise. More generally, the tester is given the components of a mixture of two distributions. However, it does not know the mixing parameter, i.e., the magnitude of the contribution of each component to the mixture. The mixture testing problem is then to test if a distribution is close to a mixture of two given distributions or is far from any manner in which the two distributions can be mixed. As we will see, by making reasonable assumptions on the form of the noise and how it is available to the tester, the tolerant testing lower bounds can be circumvented and one can obtain testers with sub-linear sample complexity.
Main contributions.
In this work, we consider distribution testing of mixtures of two distributions and . For ease of exposition, let us call the first component the original distribution and the second component the noise, and let be the domain of both and . The simplest version of our problem is: given sample access to distribution , and for known distributions , is for some , or is far from for every ? Note that the tester is not given the mixture parameter . We further study the case when are not given explicitly to the algorithm, as well as other generalizations.
We mainly focus on identity and closeness testing, which are two basic instances of hypothesis testing that have received much attention in the theory, machine learning, and statistics communities; see the works of [GGR98, BFF*+*01, BFR*+*13, Bat01, BDKR02, BKR04, Pan08, Val08, GR11, ILR12, LRR13, DDS*+*13, AJOS14, CDVV14, FJO*+*15, DKN15b, DKN15a, ADK15, ABR16, CDGR16, CDKS17, DP17, VV17a, DKS18, SDC18, BH18] and the surveys of [Rub12] and [Can15].
The mixture testing problem has a more constrained model compared to the tolerant testing problem, so one might hope to bypass the existing lower bounds. However, the mixture testing problem can also run into near-linear sample complexity lower bounds if one does not provide the tester with sufficient access to the mixture components. Indeed, if the tester does not have access to the noise, we show the mixture testing problem becomes as hard as tolerant testing, necessitating samples (Theorem 3.4). Hence, to show nontrivial positive results, the tester must have access to some kind of information about the noise. We consider the following three cases for the noise, namely, (i) when the noise is given as an explicitly specified distribution, (ii) when the tester does not explicitly know the noise distribution, but does have sample access to it, and (iii) when there is no explicit description or access to samples from the noise distribution, but it is known that the noise distribution comes from a class of distributions, e.g., the set of -histogram distributions. For the first, we obtain a tester with sample complexity and for the second, we obtain a tester with sample complexity where is a given proximity parameter; these show that the complexity of our testers is exactly the same as for the classical non-mixture case. For the third, when the noise is assumed to come from the set of -histogram distributions, we obtain an identity tester that uses samples.
2 Preliminaries
For the rest of the paper, we use the following notation. For a distribution over , we use to denote the probability of element and for a subset , let . We use to indicate the -norm of a vector. We typically use the -distance and say and are -close if and -far otherwise. Let denote the uniform distribution on ; we drop the subscript when the domain is clear from the context. Distribution is a mixture of and if there exists such that . We call the mixture parameter. We use to denote the mixture when the components and are clear from the context.
Background.
Through this paper we consider several distribution testing problems: For a given property of distributions, we use to denote a set of distributions that satisfy the property. The distance of distribution to is the -distance between and the closest distribution in . In a distribution testing problem, the goal is to distinguish whether is in or is -far from . We say an algorithm is a tester for property if the following is true with probability . 111The success probability of is arbitrary here. Given such tester, we can achieve a success probability of , via standard amplification methods, at the cost of a multiplicative increase in the sample complexity.
- nosep
Completeness: If is in , then the algorithm outputs accept.
- nosep
Soundness: If is -far from , the algorithm outputs reject.
The algorithm is an -tolerant tester, if it also satisfies the stronger completeness property that when is -close to some distribution in , then the algorithm outputs accept (with probability at least ). These definitions can be extended to the case of properties of collections of more than one distribution. Although in the standard setting we receive samples from at least one distribution in the collection, the testing problems may be defined with respect to other methods of access.
We make one of the three following assumptions regarding the algorithm’s view of the distributions: (i) The distribution is explicitly given or known if the algorithm knows the probability of each domain element under the distribution. (ii) The distribution is given by samples if the algorithm has access to an oracle that provides samples from the distribution. (iii) The distribution is not known nor given by samples but is a member of a given class of distributions.
The term identity testing is used to refer to the setting in which we test if a distribution, which we have sample access to, is equal to a known one. Note that this is equivalent to testing property . The term *closeness testing * refers to the setting in which we test if two distributions, both available via samples, are equal or not; in this case, is the set of pairs of equal distributions.
Mixture testing problems.
Suppose , , and are distributions over . Let (we usually drop the subscripts when they are clear from context). In a mixture testing problem, the goal is to distinguish whether a distribution given via samples is in or -far from any distribution in with probability at least 2/3. We investigate the following problems, which differ in the way that mixture testing algorithm can access . Note however that the mixture parameter is not given to the tester. (i) An algorithm is an identity tester in the presence of known noise if it solves the mixture testing problem when are known to the tester. (ii) An algorithm is a *closeness tester in the presence of noise that is accessible via samples * if it solves the mixture testing problem when are not explicitly given, but samples of each are provided to the tester. (iii) An algorithm is an identity tester in the presence of class -noise if it can distinguish whether is a mixture of a known distribution and some . Note that such an algorithm is a property tester for .
Note that one can also define “closeness testing in the presence of known noise”, and “identity testing in the presence of noise that is given via samples”, but our lower bounds will show that the sample complexity of these tasks is the same as the sample complexity of closeness testing in the presence of noise that is given via samples.
3 An overview of our results and techniques
3.1 Testing identity in the presence of known noise
We first consider the problem of testing if distribution , given via samples, can be expressed as mixture of known distributions and . We show the following.
Theorem 3.1**.**
*Given two known distributions , , and , there is an identity tester in the presence of known noise that uses samples. Furthermore, samples are required. *
At a high level, we take the following steps to test if is a mixture or -far from it. First, we develop an algorithm (learner) to learn mixture distributions. The learner receives samples from and outputs a mixture distribution . If is a mixture, then we show that the learner finds a mixture distribution that is -close to for some proximity parameter ; and if is not a mixture, the learner outputs with no specific guarantee. Second, we use the distance between and as a measure to decide about : if is -close to , we accept ; and if is -far from , we reject it. This approach results in a tester for . In fact, if is a mixture, then we show that the learner finds a that is -close and if is -far from being a mixture, then we show that has to be -far from any mixture distribution, including .
The challenge in this approach is to distinguish whether is close to or far from it. In general, testing whether two distributions are -close or -far from each other requires samples. However, we show that we can exploit the structural properties of mixture distributions to achieve a sample-efficient algorithm. Below we provide a more detailed description of the steps.
The learner.
The algorithm begins by assuming that the given distribution is indeed a mixture, and attempts to learn the mixture parameter: If is a mixture, then we show that it can be learned to error using samples given and . The algorithm picks a subset of elements such that it contains every element for which and estimates the weight of these elements according to , i.e., . satisfies that is exactly the total variation distance between and . Comparing with the weight of these elements according to and guides us to choose a mixture parameter , and allows us to bound the distance between and . (Instead of learning , one might do a grid search on ; however the granularity required could make the resulting algorithm sub-optimal.)
Assessing the distance between and .
After obtaining , the task of distinguishing whether distribution is a mixture or -far from a mixture boils down to testing if is -close to or is -far from it. We propose a scheme to reshape the distributions and and get two new distributions and such that for that is a mixture, the -distance between and is at most . Furthermore, in the case where is -far from being a mixture, is -far from . It is known that one can efficiently distinguish the case that versus using samples [DK16, CDVV14].
Here, we elaborate further on how we reshape the distributions. Similar techniques have been used previously to reduce the -norm, e.g., in [DK16]. Here, we use it to bound the -distance between and . The reshaping process is as follows. Define , the reshaped distribution of with a new domain which is larger than the domain of . For each element , we determine an integer solely based on , , and . Then we add for all in to the domain of . We set the probability of element to be . Also, we reshape according to the same process and get .
But how can reshaping reduce the -distance? Given that is a mixture, for each element in the domain, the discrepancy between the probability of according to and , , is proportional to . With this observation, we set the ’s such that they make the discrepancy for each element. This ensures the -distance between and is .
The arguments described above are formalized in Theorem 4.3. In addition, in the case where and are uniform, this problem is as hard as testing if a distribution is uniform, which needs at least samples ([Pan08]), showing that the sample complexity of our algorithm is tight. Furthermore, we match the sample complexity of the standard identity tester where there is no noise involved.
3.2 Testing closeness in the presence of noise
that is accessible via samples
We next investigate the problem of testing closeness of distributions in the presence of noise that is accessible via samples. Suppose we have sample access to three distributions, , , and , over . The goal is to test if there is a mixture parameter such that , or is -far from any distribution in this form.
Similarly to the identity testing algorithm explained earlier, our approach is first attempt to learn . That is, we design an algorithm that finds a candidate mixture distribution, , such that if is a mixture of and , then and will be -close to in -distance; and if is not a mixture, the algorithm finds a distribution with no specific guarantees. Then, we test to see if is -close to in -distance, or -far from it. The answer of the test dictates if we should accept or reject . Indeed, if is a mixture distribution, is very close to , and the test will accept . If is -far from being a mixture, then is -far from , and furthermore and are -far from each other in -distance, so that the test will reject .
But how do we learn ? Since we are looking for , which is close to in -distance, we study the problem of estimating the -distance between and a mixture distribution of and . Inspired by the -distance estimator proposed by [CDVV14], we propose a statistic such that given it estimates the -distance between and : where , , and are the number of instances of element among samples from , , and respectively. The statistic is designed such that it is equal to in expectation where is the number of samples from each distribution , , and .
Given the sample sets, the goal is to use the quadratic function to find a candidate . For now, assume is a mixture of and with parameter . We make two observations about : (i) the expectation of is minimum, in fact zero, when , and (ii) we provide a threshold for which is at most with high probability. Although is not given to the algorithm, we wish to pick a candidate that is very close to . We use the above two observations as a guide to take the following strategy: find that minimizes while is at most . This method apparently finds several candidate ’s. We establish that if is a mixture, then one of the candidate ’s will result in a mixture distribution that is -close to in distance. (Once again, a grid search on will not yield an optimal sample complexity.)
From there on, we only need to test if any of the candidates we found are -close to or not. If is a mixture we are promised that one of the candidates will pass the test. Otherwise we show that all candidates have to give distributions that are -far (implying -far in -distance) from by definition, so all of them will fail. Our approach yields the following result:
Theorem 3.2**.**
*Assume we have sample access to three distributions , , and over . There exists a closeness tester in the presence of noise that uses samples. Furthermore samples are required. *
See Theorem 5.6 for the formal statement of the result. For the lower bound of sample complexity, we establish that the lower bound for standard closeness testers holds in the mixture setting as well, even in the case where or is known. In particular, we show given sample access to and , testing whether is a mixture of and the uniform distribution requires samples (Proposition 1). Hence, one cannot hope to achieve a better sample complexity.
3.3 Testing identity in the presence of -flat noise
On the one hand, the sample complexity of distribution testing under arbitrary noise is significantly worse than that of noise-free distribution testing. On the other hand, we have seen that the sample complexity of distribution testing with noise (either known or given via sample access) is very similar to the sample complexity of noise-free distribution testing. This raises the question of whether one can relax the requirement of the access to the noise by the tester and still achieve better sample complexity. The next problem we consider is the identity testing problem when there is no direct access to the noise (either via samples, or an explicit description) except for the promise that the noise comes from a class, in particular, the class of -flat distributions.
We say a distribution is -flat if the probability mass function of the distribution is a piece-wise constant function with pieces. We investigate the following problem: given a known distribution and having sample access to , can we distinguish if is a mixture of and some -flat distribution, or is -far from any such distribution? We provide an algorithm that uses samples.
Inspired by the identity tester proposed in [BFF*+*01], we propose the following approach. First, we guess the intervals on which the noise is constant. Then, we take the elements of each interval and further partition them into subsets (not necessarily contiguous) such that in each subset the probability of the elements according to are very similar to each other (similar enough so that we can show that is nearly-uniform on each subset). For a mixture distribution , if we have guessed the intervals correctly, is almost uniform within each subset since it is a mixture of an almost uniform and a constant function (noise). Hence to see if is a mixture, we first test each of these subsets and see if is close to uniform on them. We then estimate the total weights that assigns to each of these subsets and determine if the weights are consistent with a mixture of and some -flat distribution. One challenge is to find a sampling method that guarantees good results for all initial guesses of the intervals describing the noise. See Section 6 for details.
It is not hard to see that in the case where is uniform, identity testing in the presence of -flat noise is as hard as testing whether a distribution is -flat or -far from any such distribution. Therefore, there exists a lower bound of for our problem derived from the lower bound for testing -flat distributions in [Can16].
3.4 Lower bounds
We show that testing identity with respect to the uniform distribution when the noise component can be an arbitrary distribution, requires near-linear samples, i.e., . More specifically,
{restatable*}
theoremLBBigness Assume is a distribution on . There exists a constant parameter such that distinguishing the following cases with probability at least requires samples.
- nosep
There exists a noise distribution on , namely , and an such that is a mixture of uniform and with parameter , i.e., .
- nosep
There is no noise distribution such that unless .
The main idea is to reduce this problem to the that of testing the -bigness property [AGP*+*19], which holds if all probabilities are above a given threshold .
4 Identity testing of mixtures in the presence of known noise
In this section, we give an algorithm which tests if is close to a mixture where both the components are explicitly known. As before, we assume the mixture parameter is unknown.
The main idea is attempt to learn a mixture distribution that is close to . Using , , and , we then reshape the distribution to another distribution and use the same reshaping to transform to . The reshaping has the property that in the case that is indeed a mixture, then and will be extremely close to each other in -distance and if is not a mixture, then and will be quite far from each other. Thus we can use a (non-tolerant) identity tester on and .
In the rest of this section, we present the three main steps of the testing algorithm. The first step is a learner algorithm that finds (Section 4.1). The second step is a reshaping process that transforms the distributions and into and respectively (Section 4.2). The third step is to put these pieces together to get the identity tester (Section 4.3).
4.1 The learner
At a high level, the leaner proceeds as follows. Observe that if is a mixture of and , then there is a parameter such that , so to learn it is sufficient to learn . Let be the set of all domain elements, , where is at least . By definition, for , we have , which leads to . The idea then is to replace with its estimate, say, to get an estimate of . We formally describe the procedure in Algorithm 1 and prove its correctness in Lemma 4.1.
Lemma 4.1**.**
Suppose . Using samples, Algorithm 1 outputs a mixture parameter such that .
Proof.
First, observe that if and are -close, then is -close to as well, so distribution which is a mixture with parameter is a valid output. For the remainder of the proof, we assume and are -far from each other.
Let . We have The only unknown value in the above expression is , which we estimate using samples from . We show by replacing , we get a viable estimate for .
Let be the estimate that is the ratio of the samples that are in . By the Hoeffding bound, we have We define our estimate of as: The reason that we add to is to assure an overestimation of , so becomes smaller than with high probability. I.e., since with high probability , we get:
[TABLE]
Below, we show is close to in -distance. Based on the definition of , is equal to the total variation distance (i.e., half of the -distance) between and . With probability ,
[TABLE]
∎
4.2 Reshaping the distributions
Using Algorithm 1, given , we can obtain a mixture parameter and a mixture distribution for which (i) if is the mixture of and with parameter , then is -close to for a proximity parameter , and and (ii) if is -far from being a mixture, then is -far from . Ideally, we wish to use an identity tester to see if and are roughly the same or far from each other. Unfortunately, this is not possible in general, unless and are very close on every domain element. To resolve this issue, the goal in this section is to introduce two distributions and such that (i) when and are close, and are very close to each other on every domain element and (ii) when and are far, and are far. Our reshaping process is inspired by the method of [DK16]: For each element , using , we define:
[TABLE]
Note that the process in [DK16] uses only the first and third terms of the above sum in defining . We start the reshaping process by associating buckets to each domain element to form a new domain and . To draw a sample from , we first draw a sample from , then we sample from , and return the pair as the sample from . We say is a reshaping of with respect to . Clearly . In a similar manner, we define the reshaping of and once again, we have .
We next prove several crucial properties of the reshaped distributions.
Lemma 4.2**.**
Let and be the result of the reshaping of and with respect to as described above. Then, the following hold:
- (i)
The -distance after reshaping does not change: . 2. (ii)
The domain size of and , . 3. (iii)
The -norm of , . 4. (iv)
If is a mixture distribution, is -close to , and is at most , then for all .
Proof.
To prove (i), note that
For (ii), we have:
[TABLE]
We now prove (iii). If , then . If , then and hence . Therefore, for all . Since the domain size of is at most , is at most .
Finally, we show (iv). Since is a mixture distribution, there is an such that . Also, we have that has a mixture parameter . Furthermore, we also have . Let which is in . Observe that
Thus, is a mixture of and . For an element , we can bound the difference of and as follows, which finishes the proof.
[TABLE]
∎
4.3 The mixture testing algorithm
In this section, we use the learner and the reshaped distributions to obtain an identity tester for mixtures of two known distributions.
Theorem 4.3**.**
*Given a proximity parameter , Algorithm 2 is identity tester in the presence of known noise that uses samples. *
Proof.
Let . In the completeness case, is a mixture distribution with parameter . Therefore, with probability at least 5/6, is a mixture distribution with parameter for which is -close to . Let and be the reshaped distributions described in Section 4.2. By Lemma 4.2(ii), . Moreover, for any , , which implies that
[TABLE]
Conversely, if is -far from being a mixture distribution, then it has to be -far from . By Lemma 4.2, and are -far from each other. Therefore, . Using the identity tester (Identity-Tester) provided in [DK16] (see Remark 2.7 and Remark 2.8), there exists an algorithm that can distinguish the above cases with probability using samples. Thus, with probability both the invoked learner and the tester returns the right answer. Also, the sample complexity is . Hence the proof is complete. ∎
5 Testing mixtures in the presence of noise that is accessible via samples
In this section, we provide an algorithm for the testing closeness of distributions in the presence of noise that is accessible via samples. We assume we have sample access to three distributions , , and , over and the goal is to test if is a mixture of and . Our approach is first to learn in an indirect manner. Specifically, we design an algorithm that finds a candidate mixture distribution such that with high probability if is a mixture of and , then will be close to . We claim that the answer to the test “is close to ” can be used to test if is close to a mixture of and . Indeed, if is a mixture distribution, by the property of the learning algorithm, is close to and hence the test will accept. Conversely, if is far from being a mixture, then is far from any mixture distribution including , and hence the test will reject.
In particular, the candidate will be such that (i) if is a mixture, then for a sufficiently small constant and (ii) if is -far from being a mixture, then . As we will see, the robust -distance tester of [CDVV14] can efficiently distinguish these two cases. Since we are looking for that is close to in -distance, we study how we can estimate the -distance between and a mixture distribution and . Let be the expected number of samples we draw; will be specified later. Assume we draw samples222 is a Poisson random variable with parameter . from , , and . Let , , and denote the (multi)set of samples from , , and respectively. Let , , and be the numbers of instances of element in each sample set. Consider the following statistic:333 This is motivated by the -distance estimator proposed in [CDVV14] in which they draw a set of samples from and and use the statistic , where (resp., ) is the number of times occurs in the samples from (resp., ).
[TABLE]
Note that if we fix the sample sets , , and , is a quadratic function of . We show that the above statistic has the expected value of .
If is a mixture distribution with parameter , then , where the expectation is taken over the randomness of the samples. Hence, a natural candidate to approximate is some where achieves its (near-)minimum. To do this, we first show that if is a mixture, then we can choose a threshold parameter such that with high probability. Then we pick that minimizes with the constraint that . Since is a quadratic, let and be the solutions. We then show that if is a mixture of and , with high probability, at least one of or is small (Section 5.1).
For the rest of the section, let be a parameter specified later for which , , and are bounded by . Thus for any mixture distribution , we have . Also, let , for a sufficiently large constant , and let . (All missing proofs are in Section 5.3.)
5.1 Finding candidates
In this section, we aim to learn a mixture distribution. More precisely, we are looking for ’s such that if is a mixture distribution, then with high probability, .
Theorem 5.1**.**
Suppose is a mixture distribution. Given , , and , with probability , one can compute a candidate set , for which there exists such that
Proof.
We consider two cases based on the -distance of and . Suppose . If is a mixture distribution with parameter , then we have:
[TABLE]
Hence, , which is a mixture distribution with parameter , is a candidate.
We now focus on the case . Without loss of generality, (otherwise swap and ). Fixing , we write (1) as , where
[TABLE]
As explained earlier, the idea is to use to find a proper candidate .
We now study the properties of and . It turns out that is the same as the statistic for testing closeness where samples are drawn from and . From [CDVV14],
[TABLE]
In the following lemma, we show how statistic with similar expected value and variance to will concenterate:
Lemma 5.2**.**
[Adapted from [CDVV14]] Assume a random variable, namely , has the following properties:
[TABLE]
where , , and are three positive constants, is an integer, and are two distributions over , and is a real number which is greater than and . If is at least for sufficiently large , then with probability 0.99 the following is true:
- •
If is at most , then is at most .
- •
If is at least , then is between and .
Thus, using the above lemma, we show that with probability 0.99, there is a constant such that
[TABLE]
might not have a nice closed-form expression when is an arbitrary distribution, but when is a mixture, it has the following property.
Lemma 5.3**.**
Suppose is a mixture of and with parameter . Let be a function of the sample sets , , and as defined in (2) and let . If the sample sets, , , and , each have samples, then with probability 0.99, there exists such that
[TABLE]
We now analyze for a fixed . (The following in fact holds for any distribution over .)
Lemma 5.4**.**
For a fixed ,
[TABLE]
By Lemma 5.4 and Lemma 5.2, with probability 0.99, if , then
[TABLE]
With probability 0.97, all of (4), (5), and (6) hold; we condition on this from now on.
Since is a quadratic and since from (4), let where achieves its minimum. We define and as follows:
[TABLE]
Note that (6) guarantees that either or exists depending on if or not; they can also be found very efficiently by binary search. It remains to show that one of and is very close to in -distance.
Lemma 5.5**.**
We have either or
Note that by choice of our parameters, we have . Hence, either or is a candidate. Thus our potential candidates so far are , , and . In addition, given our assumption for , we need to compute the corresponding and when and are swapped. Hence, we have at most five candidates for . ∎
5.2 Mixture closeness tester
In this section, we provide our algorithm and prove its correctness in the following theorem.
Theorem 5.6**.**
Given a proximity parameter , Algorithm 3 is an closeness tester in the presence of noise that is accessible via samples and it uses samples.
Proof.
We reduce the -norm of the three input distributions via the reshaping technique proposed in [DK16]. Let be a multi-set consisting of samples, where samples are chosen from each distribution , , and . For , we assign buckets to element where is the number of instances of element in set plus one. For a distribution over , we define to be a distribution over all the buckets, . We generate a sample from via the following process: (i) draw a sample , (ii) pick uniformly at random, and (iii) output . The probability of any element according to is . It is known that flattening does not change the -distance between two distributions. Let , , and be the distributions , , and after flattening. We show that a mixture distribution will remain a mixture after flattening. More precisely, if is a mixture of and with parameter , then it is easy to see that is a mixture of distributions and with the same parameter . Thus, it suffices to test if is a mixture of and .
By setting , according to [DK16, Lemma II.6] and Markov’s inequality, we can assume the -norms of all three distributions , , and are at most with probability at least , where we set . Also, note tht .
Given Theorem 5.1, one can find a set of at most five candidates. If is a mixture of and , then there is an such that . On the other hand, if is -far from being a mixture, it is also -far from all ; using the Cauchy–Schwarz inequality, we have . Note that [CDVV14] showed one can estimate the -distance accurately using samples and with probability 0.99 (see Lemma 5.7 in Section 5.3.)
By a union bound, the probability that does not contain the right , the probability that the estimation fails, and the probability that sum up to below . Hence, with probability 2/3, the algorithm outputs the right answer and the total number of samples is . ∎
5.3 Proofs for Section 5.1 and Section 5.2
In this section, we present the proofs of the lemmas presented earlier in Section 5.1 and Section 5.2.
See 5.2
Proof.
We use Chebyshev’s inequality to prove the lemma. For the first case, by the -norms inequality, we have the following:
[TABLE]
where the last inequality is true when .
For the second case, we have the following:
[TABLE]
where the last inequality is true when . This completes the proof. ∎
See 5.3
Proof.
Recall that is defined to be . To analyze the expected value and the variance of , we consider each terms in the sum. Let denote a single term in the sum after ignoring constant 2:
[TABLE]
Note that via the Poissonization method, the ’s, the ’s, the ’s, and consequently the ’s are independent random variables. Note that if is a Poisson random variable with mean , then is . Using this equation, we compute the expected value of :
[TABLE]
Thus, the expected value of is the following:
[TABLE]
where is a constant between . Using the first four moments of the Poisson distribution and the fact that , we have the following:
[TABLE]
Using the bound above and the Cauchy–Schwarz inequality, we bound the variance of as follows:
[TABLE]
Clearly, the variance of is equal to the variance of , and it is bounded the same as above. Note that is at most , and the sample sets, , , and , each have at least samples. By Lemma 5.2, there exists such that
[TABLE]
with probability 0.99 which concludes the proof. ∎
See 5.4
Proof.
In this proof, we adapt the proof of Proposition 3.1 from [CDVV14]. Recall that
[TABLE]
Via the Poissonization method, we can assume (similarly and ) is a random variable from (similarly and ), which is drawn independently from the rest of the random variables. Note that if is a Poisson random variable with mean , then is . Using this equation and the independence of the random variables, for a fixed , we have:
[TABLE]
Now, we bound the variance of for a fixed . Let denote a single term in the summation:
[TABLE]
Using the moments of the Poisson distribution, we have
[TABLE]
Now, we bound the variance of which is the sum of independent terms, ’s. Using the Cauchy–Schwarz inequality, and the fact that is at most , we have
[TABLE]
where is at least and by the first condition of the theorem. ∎
See 5.5
Proof.
Consider the statistic as a function of : . Since is positive, takes its minimum at . By Equation 4 and Equation 5, is , and for any , we have:
[TABLE]
Depending on whether is larger than or not, we consider the following cases.
Case 1: . Let be the smallest number in for which is at most . Clearly, exists since is a potential solution, so the solution interval is not empty. Note that based on the way we pick , the following are true: (i) is at most , (ii) is at least , and (ii) since is positive, and is increasing over , then is at most . Hence, by Equation 8, we have:
[TABLE]
If we replace by a smaller quantity ,, where both are positive then we have:
[TABLE]
Case 2: . We replicate what we did in the previous case. Let be the largest number in for which is at most . Clearly, exists since is a potential solution, so the solution interval is not empty. Note that based on the way we pick , the following are true: (i) is at least , (ii) is at least , and (iii) since is positive, and is decreasing over , then is at most . Hence, by Equation 8, we have:
[TABLE]
If we replace by a smaller quantity, , where both are positive, then we have:
[TABLE]
The left side of Equation 9 and Equation 10 are in the form of the -distance between two mixture distributions and due to the following:
[TABLE]
Note that we are either in case 1 or case 2. So, on of the two equations, Equation 9, Equation 10 has to be true. By Equation 11, of the following is true.
[TABLE]
which concludes the proof. ∎
Lemma 5.7**.**
[Restated from [CDVV14]] The procedure -Estimator described in Algorithm 4, that uses samples, has the following property with probability 0.99:
- •
If is at most , then is at most .
- •
If is at least , then is between and .
Proof.
We use the -distance estimator proposed in [CDVV14]. However, for the sake of completeness, we provide the process in Algorithm 4. and are two sample sets each containing samples from and respectively. Let and indicate the numbers of samples in and respectively. The authors showed that the expected value of the statistic is , and the variance is bounded by . By Lemma 5.2, if we draw samples, then the algorithm will have the desired property with probability 0.99. ∎
6 Testing under -flat noise
We have so far considered the problems of identity testing and closeness testing in the presence of the noise that is directly accessible and proved these problems have the same sample complexity as their respective noise-free versions. These results raise the question of whether one can replace the requirement of access to the noise by an assumption that restricts the noise to be in a class of distributions and still achieve improved sample complexity compared to the near-linear lower bound we mentioned earlier. In this section we develop a tester for identity testing when the noise distribution belongs to the class of -flat distributions without any further information. This assumption means that the noise can be any -flat distribution, while the parameters of the -flat distribution are not known to the tester, nor given via samples.
6.1 Preliminaries
We begin by formally defining -flat distributions: We say is a -segmentation of if and only if are disjoint intervals that cover . Also, we say a function is a -flat function if and only if there is a -segmentation of , namely , such that for any two elements, and , in the same interval in , is equal to . A distribution is a -flat distribution if and only if its probability mass function is a -flat function.
We next define concepts that will be necessary for describing our algorithms. For any distribution and a partition of its domain, the coarsening of over , denoted by , is a distribution over the sets in where the probability of each set is . For a subset , we define the restriction of to , denoted by , to be a distribution over for which the probability of is equal to . Although the restriction is well-defined only when is not zero, abusing notation, we define to be zero if or is zero.
Also, throughout this section, we study different schemes for partitioning the domain. In addition to -segmentation, which is defined earlier, two other schemes are defined as follows: Given a known distribution , Batu et al. in [BFF*+*01] provide a partitioning scheme, called bucketing, which places elements with similar probability in the same bucket. Note that, in contrast with -segmentation, this scheme does not necessarily place consecutive elements in the same bucket.
Definition 6.1** (Similar to [BFF*+*01]).**
Assume we have a known distribution over . Given a parameter , we define the bucketing of the domain, Bucket, to be a set of subsets of the domain, , where each subset is defined as below:
[TABLE]
[TABLE]
We define the last partitioning scheme below. This partition is a refinement of the bucketing with respect to a -segmentation .
Definition 6.2**.**
Assume is a -segmentation of , and is a bucketing of containing disjoint subsets. We define to be a division of the domain for which the ’s are the intersection of the th interval and the th bucket. Formally, is defined as:
[TABLE]
The problem of testing identity in the presence of -flat noise.
Suppose we are given a known distribution , and sample access to a distribution both over the domain . Let denote the class of all -flat distributions over . The problem of testing identity in the presence of -flat noise boils down to distinguishing the following cases with probability at least 2/3:
- •
There exists a mixture parameter and a -flat distribution over such that is a mixture of and with parameter , i.e., .
- •
is -far from any distribution of the form where and .
6.2 The algorithm
We start by explaining the properties of the partitioning schemes we defined earlier. Let be the bucketing of the domain elements for a parameter . The algorithm can obtain this bucketing since and is known to the algorithm. The bucketing scheme is designed such that the probabilities of the elements in a bucket are within a -factor of each other (except for ). This property implies that the restriction of to any bucket is extremely close to the uniform distribution.
Now, assume that is in fact a mixture of and a -flat distribution . We denote the -segmentation of by (which is not known to the algorithm). By definition, the restriction of on any is a uniform distribution. Consider the division , described in Definition 6.2. Observe that is a subset of both and . One can show that the restriction of is uniform on , and the restriction of to is very close to the uniform distribution as well. Thus, , which is assumed to be the mixture of and , must be very close to the uniform distribution on . We formally prove this claim in Lemma 6.5.
Based on the above observation, our tester looks for two qualities in to assert that it is a mixture distribution: Given a division , (i) are the restrictions of to the ’s almost uniform and (ii) is the overall shape of over ’s (i.e., ) consistent with a mixture of and a -flat noise distribution? More specifically, our tester follows these steps. For every -segmentation , the tester checks that the restriction of to each is almost uniform. If it figures out that it is not the case, it abandons the current segmentation, and start over with another one. If at some point, the tester passes this step, it checks the overall shape of . It draws enough samples from and forms the empirical distribution from the samples. Then it checks whether there exists a -flat function, , such that is consistent with a mixture of and . If the tester finds a -segmentation such that the distribution passes the two steps above, then it asserts that is a mixture and outputs accept. Otherwise, it outputs reject.
Based on our first observation, one can expect the tester to accept a mixture distribution . However, the main challenge is to show that the tester rejects when is -far from being a mixture. To prove this fact, we also use the following observation. Suppose we have two distributions and . Let be a partition of their domain. We prove that if and are -far from each other, there is a noticeable discrepancy between either their coarsening distributions over or their restrictions to the subsets in (Lemma 6.6). This observation implies that if is -far from being a mixture distribution, then at least one the steps will fail. Hence, we distinguish both cases with high probability.
We describe our tester in Algorithm 5 and show its correctness in Theorem 6.3. Later, we also discuss how to avoid trying all ’s and achieve a polynomial time algorithm. All missing proofs in the rest of this section are in Section 6.3.
Theorem 6.3**.**
Algorithm 5 is an identity tester in the presence of -flat noise that uses samples.
Proof.
We set . We denote the number of buckets in by . Let denote . Without loss of generality we assume . Otherwise, one could learn the distribution up to -distance error via samples, and trivially check if it is -close to a mixture of and a -flat distribution.
Consider a segmentation , and a division . To obtain better sample complexity, we need to make sure that the size of each set in is not greater than . In the case that a large set of size exists, we split it into sets of roughly the same size and denote them by for . The new sets form a new partition of the domain. We call it a refined division, denoted . Note that this replacement will not asymptotically increase the total number of sets in the division, since has many sets.
Now, we establish that for a sufficiently large number of samples, the three steps in the algorithm succeed with high probability. First, in the following lemma, we show that samples are enough to obtain an empirical distribution such that for all the divisions and are -close to each other with probability 0.9.
Lemma 6.4**.**
Assume is a distribution over . Let be an empirical distribution formed by samples from . Fix a bucketing of the domain Bucket. For every -segmentation , and the corresponding refined division of the domain , the coarsening of and the empirical distribution over is at most -far from each other with probability at least .
Second, we show that if , for a fixed and , is at least , then contains at least fraction of the samples with high probability. Note that there are at most set for a fixed . Using the Chernoff bound, the claim is true for all ’s with probability 0.9 if we draw more than samples.
Third, we show if contains enough samples, then with high probability we can distinguish whether is at most , or it is at least : If we draw samples, we receive samples from any set with . Based on [DGPP16, Theorem 1], with probability , we can distinguish whether is at most or at least using samples. By repeating this times and taking the majority answer, we can be assured to obtain the correct answer for the test on all the ’s with probability at least 0.9. Thus, we need samples for this step.
In the above three steps, we need the following number of samples:
[TABLE]
By a union bound, the probability than any of the above steps goes wrong is at most 0.3. Hence, for the rest of the proof, we assume that the algorithm carries out the steps as expected with probability at least 2/3. Given this assumption, we show in both the completeness case and the soundness case, the algorithm outputs the correct answer.
Completeness: In this case, there exist a -flat distribution over , , and a parameter such that . First, note that in each is close to the uniform distribution. In particular, we have the following lemma.
Lemma 6.5**.**
Suppose is a mixture of and with parameter . Let , , and , be the partitions we defined earlier. For any non-empty set, , if , then the restriction of to the set, , is -close to the uniform distribution in -distance and -close to the uniform distribution in -distance.
The lemma implies that for all the , is close to the uniform distribution. Hence, the algorithm while considering the segmentation , will not continue with another segmentation since is being far from uniform, and the algorithm will move on to the next step.
Also, we show that a -flat function, , exists, because is a solution itself. We have samples which is enough to learn the coarsening of over . Thus, the coarsening of the empirical distribution, , is -close to the coarsening of over . There exists an iteration in the algorithm in which we try a parameter such that is at most . Therefore, itself is a solution the algorithm is looking for:
[TABLE]
Hence, the algorithm will not output reject.
Soundness: In this case, is -far from any mixture distribution for any -flat distribution and . We have the following structural lemma (similar to Lemma 6 in [BFF*+*01]) which bounds the distance between and from above:
Lemma 6.6**.**
Assume and are two distributions on , and let be a refined division of the domain elements. Then, we have
[TABLE]
Since the distance between and is at least , we can apply this lemma to obtain a lower bound for the two quantities in the right hand side of the equation above.
[TABLE]
At least one of the two terms on the right hand side above is greater than . Net, we show if the algorithm reaches to the point that forms the empirical distribution, then the second term is at most . On the other hand, if the algorithm outputs accept, then the first term is at most . Hence, these two events cannot happen at the same time while .
Formally, if there is no such that causes the algorithm to move forward to the next segmentation, then for each either the weight of the set is not larger than , or the -distance between and the uniform distribution is not more than . In the following lemma, we show that this situation implies that the second term in Equation 13 is at most .
Lemma 6.7**.**
Suppose for every non-empty in the division , either is at most , or is at most . Let be a mixture of and a -flat distribution over with an arbitrary in . Then, the following holds
[TABLE]
On the other hand, if the algorithm outputs accept, it implies that there exists a function and , such that is at most . In the following lemma, we show it implies that there exists a such that is at most .
Lemma 6.8**.**
Assume , , and are three distributions over , and is a -flat function over -segmentation . For a division , suppose is -close to , and there exists such that is at most . Then, there exists a -flat distribution , such that is -close to the mixture of and with parameter .
Moreover, outputting accept means that the two terms in Equation 13 are at most , which contradicts the fact that one of them has to be . Hence, the proof is complete. ∎
A faster algorithm.
In the interest of a simpler exposition, the algorithm described above tries all possible -segmentations. However, there are at most possible subsets that could appear as ’s. Hence, one can test uniformity of on each of them separately regardless of . Moreover, finding a -flat function for which the -distance between and the mixture of and is minimized, can be done via dynamic programming: we define to be the smallest -distance between and mixture of and any -flat distribution when we consider only the first elements of the domain. We compute using the previously computed :
[TABLE]
where the cost is defined as follows: We set the cost of an interval to infinity if any subset of which would have appeared in the divisions (i.e, all subsets in such form for ) does not pass the uniformity test. Otherwise, cost is the minimum -distance between and a mixture of and a constant function for the elements in . Since we are only looking for -flat functions rather than distributions, the updates can be computed locally and independently of the rest of segments.
6.3 Proofs for Section 6.2
In this section, we provide the proof for the lemmas stated earlier in Section 6.2.
See 6.4
Proof.
Suppose we draw samples from . Let indicate the number of occurrences of among the samples. Let be the empirical distribution formed by samples which means that . The goal is to show that for every segmentation , the coarsening of and over are -close with probability at least . We build on the standard idea that is used to show that samples is sufficient to learn a distribution over within error in -distance. Consider which contains disjoint subsets of . The -distance between the coarsening of and the empirical distribution is defined as follows:
[TABLE]
We need to show that the above quantity is at most for any segmentation and its corresponding division . However, we prove a stronger claim: Suppose we have a collection of vectors of length with entries in for which the following is true:
- •
For every refined division , an every set , there exists a vector such that if is in , then .
- •
For all , is at most with probability at least .
The proof is complete if we establish this claim, so now we focus on proving that the collection exists. We first put a vector corresponding to each refined division. Then we show there is an upper bound for the size of the collection. Next, we show since there are not too many vectors in the collection, with high probability, is at most for any .
Clearly, there are no more than possible vectors. However, we get a better bound for the cases when is not arbitrarily large. We begin by considering a refined division . Fix a set . If two elements in are in the same interval for , then they will have the same as well. Thus, if we sort elements in , and then write the corresponding ’s, then we get a sequence of and where the sign is changed in at most places. To uniquely represent the sequence, one can determine the indices where the sign changed and indicate whether the sequence starts with or . Thus, the total number of such sequences is:
[TABLE]
Note that we have at most such subsets of the domain . Thus, the total number of vectors in is at most .
Next, we show that if we draw enough samples, the probability of for any is at most . Fix a vector in . Consider the following random process: we draw a sample from , namely ; if is one, output one and otherwise output zero. In other words upon receiving sample , we output . Assume are samples from that form the empirical distribution. Suppose that we generate according to the process using these samples from , i.e., . The ’s are independent random variables with the following expected value.
[TABLE]
Clearly, the average of ’s are close to its expectation with high probability, we use this fact to show that are close to zero as well. Recall is the number of occurrences of element in the sample set. Using the Hoeffding bounds, we achieve:
[TABLE]
Therefore, by setting and using Equation 14 and a union bound, for every , is at most with probability . This completes the proof. ∎
See 6.5
Proof.
Fix a non-empty set in for some . To prove the lemma, we show that the ratio of the maximum and the minimum probability according to in is at most . Consider two elements in namely and (if there is only one element in the claim is apparent). Without loss of generality assume . By definition of , and are in the same interval of , so and are equal. Thus, we have:
[TABLE]
the second to last inequality is true, because we have . Also, the last inequality is true since both and are in . In the proof of Lemma 8 in [BFF*+*01], it is show that if the ratio of the probabilities in a set, in our case , is bounded by , then for all , is at most . This completes the proof. ∎
See 6.6
Proof.
Fix a set in , namely , which and are non-zero, we have the following:
[TABLE]
Therefore, we have:
[TABLE]
If we swap and in the above inequality, and replicate the equations, we have:
[TABLE]
Putting Equation 16 and Equation 17 together, we get:
[TABLE]
If at least one of and is zero, it implies:
[TABLE]
Hence, we have:
[TABLE]
∎
See 6.7
Proof.
We first consider a non-empty when . Since , is a subset of . For each , is at most . Also, is a -flat on , and since is a subset of , for all , is the same. We denote this quantity, , by . Here, we prove that either is small, or has to be close to uniform.
We have two cases. First, suppose is at most . In this case, is at most . Thus, the total weight of such sets, sum of ’s, is at most . Second, assume is greater that . On the other hand, is at most . These two facts implies for each in :
[TABLE]
Therefore, the -distance between and the uniform distribution is bounded:
[TABLE]
Note that if is greater than one, the -distance between and the uniform distribution is bounded by as well. Therefore, if is close uniform distribution, it has to be close to as well. That is,
[TABLE]
Hence, given the discussion above there are three possibilities for . (i) is at most . Since -distance is at most , the total contribution of these sets in the sum below is at most . (ii) is at most , so the total contribution of these sets is at most . (iii) is at most .
[TABLE]
Hence, the proof is complete. ∎
See 6.8
Proof.
First, consider a degenerate case. If , then the claim is trivially true by the triangle inequality: Thus, assume .
For now, consider the case that there exists such that is not zero, so is greater than zero. First, we show that since is close to the mixture of and , the sum of the ’s has to be close to one. That is,
[TABLE]
We define to be the normalization of for which for all in the domain. If is a -flat function, then will be a -flat distribution. Now, we show that the mixture of and is close to the mixture of and with mixture parameter .
[TABLE]
where the last inequality is due to Equation 18. Moreover, by the triangle inequality, we have:
[TABLE]
Now, assume is zero for all in . We show that if we set to be the uniform distribution over , the same result holds. First, observe that the uniform distribution is a -flat distribution for any . Then we show that is -close to . Since for all in ,
[TABLE]
On the other hand, since is -close to , one can show that is at most .
[TABLE]
Therefore, whether is zero or not, there exists a -flat distribution, for which is at most . Since is -close to , and by triangle inequality, we have:
[TABLE]
which concludes the proof. ∎
7 Lower bounds
In this section, we present lower bounds for testing mixtures in different settings discussed earlier.
\LBBigness
Proof.
We prove by showing a reduction from mixture testing to testing bigness property of distributions. A distribution called -big if the probability of any domain element is at least [AGP*+*19]. In addition, they showed there exist two constant parameters and and two family of distributions, namely and , such that the following is true
- •
All distribution in are -big.
- •
All distribution in are -far from being -big. Moreover, all the probability of each element according to the distributions is either zero or at least .
- •
Using samples from a distribution in the families, no algorithm can distinguish whether the distribution was from or with probability at least .
Let . We show that any algorithm that can test mixtures as described in theorem, can distinguish and with high probability.
First, we show that for any -big distribution, denoted by , there exists distribution such that is a mixture of and uniform distribution, meaning for . Let assign the following probability to the th element of the domain:
[TABLE]
It is not hard to see that as defined above is a probability distribution. Since is -big distribution, all the are at least , so all the ’s are non-negative. Also, . Clearly, is a mixture in the form for .
Note that for any distribution in there is at least one element (in fact many elements) that has probability zero. Otherwise, all elements would have probability at least , and the distribution would be big. On the other hand, any distribution that is mixed with uniform with parameter cannot have any zero probability element. Thus, is not a mixture of the form when .
Thus, any algorithm that can test mixture property as defined in the theorem has to accept and reject . However, we know this is not possible unless the algorithms gets samples. This completes the proof. ∎
Proposition 1**.**
When we have sample access to and , any closeness tester in the presence of uniform noise samples.
Proof.
First, note that one can reduce testing uniformity to this problem by setting equal to the uniform distribution. Therefore, it requires at least samples by the lower bound for uniformity testing shown in [Pan08].
Now, we establish that many samples is also required. Without loss of generality, assume . Otherwise would be the dominating term in the lower bound up to a constant factor. To prove the lower bound, we use two distributions (and any random relabeling of them) used in proving lower bounds for testing closeness of distributions [BFR*+*13, VV17b, VV17a, CDVV14]. More precisely, we define two distributions and such that distinguishing and (and any random relabeling of them) requires samples. On the other hand, we show that any -mixture tester has to distinguish and . Thus, the statement of the proposition is concluded.
Let and . Consider three disjoint subset of domain elements , namely , , and each of size , , and respectively. Let and be the following distributions:
[TABLE]
Note that is -far from any mixture distribution of and with parameter , since
[TABLE]
Clearly, in the case where and , is a mixture of and with mixture parameter , and in the case where and , is -far from any mixture distribution of and . Thus, a -mixture tester has to distinguish between and . By proposition 4.1 in [CDVV14], we know that this task requires samples. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[ABR 16] Maryam Aliakbarpour, Eric Blais, and Ronitt Rubinfeld. Learning and testing junta distributions. In COLT , pages 19–46, 2016.
- 2[ADK 15] Jayadev Acharya, Costantinoss Daskalakis, and Gautam Kamath. Optimal testing for properties of distributions. In NIPS , pages 3591–3599, 2015.
- 3[AGP + 19] Maryam Aliakbarpour, Themis Gouleakis, John Peebles, Ronitt Rubinfeld, and Anak Yodpinyanee. Towards testing monotonicity of distributions over general posets. In Proceedings of the Thirty-Second Conference on Learning Theory, COLT , pages 34–82, 2019.
- 4[AJOS 14] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda T. Suresh. Sublinear algorithms for outlier detection and generalized closeness testing. In IEEE ISIT , pages 3200–3204, 2014.
- 5[Bat 01] Tugkan Batu. Testing Properties of Distributions . Ph D thesis, Cornell University, 2001.
- 6[BDKR 02] Tugkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. The complexity of approximating entropy. In STOC , pages 678–687, 2002.
- 7[BFF + 01] Tugkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In FOCS , pages 442–451, 2001.
- 8[BFR + 13] Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing closeness of discrete distributions. JACM , 60(1):4:1–4:25, 2013.
