Property Testing of Joint Distributions using Conditional Samples

Rishiraj Bhattacharyya; Sourav Chakraborty

arXiv:1702.01454·cs.CC·August 3, 2022

Property Testing of Joint Distributions using Conditional Samples

Rishiraj Bhattacharyya, Sourav Chakraborty

PDF

Open Access

TL;DR

This paper introduces a subcube conditional sampling model for testing properties of joint distributions, achieving polynomial sample complexity in the dimension, unlike traditional models with exponential complexity.

Contribution

It proposes a new subcube conditional sampling framework and develops algorithms with polynomial sample complexity for property testing of joint distributions.

Findings

01

Polynomial sample complexity for identity testing algorithms

02

Efficient algorithms for testing against known and unknown distributions

03

Avoidance of the curse of dimensionality through a chain rule technique

Abstract

In this paper, we consider the problem of testing properties of joint distributions under the Conditional Sampling framework. In the standard sampling model, the sample complexity of testing properties of joint distributions is exponential in the dimension, resulting in inefficient algorithms for practical use. While recent results achieve efficient algorithms for product distributions with significantly smaller sample complexity, no efficient algorithm is expected when the marginals are not independent. We initialize the study of conditional sampling in the multidimensional setting. We propose a subcube conditional sampling model where the tester can condition on an (adaptively) chosen subcube of the domain. Due to its simplicity, this model is potentially implementable in many practical applications, particularly when the distribution is a joint distribution over $Σ^{n}$ for some…

Tables1

Table 1. Table 1: comparison between sample complexity of testing joint distributions in Traditional Sampling Model and Subcube Conditioning Model.

Problems	Conditional Sampling		Traditional Sampling
	Upper Bound [This paper]	Lower Bound	Upper and Lower Bound
Identity to the	$\tilde{𝒪} (n^{2} / ϵ^{2})$	$Ω (\sqrt{n} / ϵ^{2})$	$Θ ({\| Σ \|}^{n / 2} / ϵ^{2})$
uniform distribution		[CDKS17]	[Pan08]
Identity to a	$\tilde{𝒪} (n^{2} / ϵ^{2})$	$Ω (\sqrt{n} / ϵ^{2})$	$Θ ({\| Σ \|}^{n / 2} / ϵ^{2})$
known distribution		[CDKS17]	[VV14]
Identity between two	$\tilde{𝒪} (n^{5} \log \log \| Σ \| / ϵ^{5})$	$Ω (m a x (\sqrt{n} / ϵ^{2}, n^{3 / 4} / ϵ))$	$Θ (max ({\| Σ \|}^{2 n / 3} / ϵ^{4 / 3}, {\| Σ \|}^{n / 2} / ϵ^{2}))$
unknown distributions		[CDKS17]	[oCDVV14]
Identity to a	$\tilde{𝒪} (n^{5} \log \log \| Σ \| / ϵ^{5})$	$Ω (m a x (\sqrt{n} / ϵ^{2}, n^{3 / 4} / ϵ))$	$Θ ({\| Σ \|}^{n / 2} / ϵ^{2})$
product distribution		[CDKS17]	[ACK15, DK16]

Equations140

d (μ, μ^{'}) := \frac{1}{2} x \in Ω \sum μ Pr (x) - μ^{'} Pr (x) .

d (μ, μ^{'}) := \frac{1}{2} x \in Ω \sum μ Pr (x) - μ^{'} Pr (x) .

μ ∣ A Pr (x) = \frac{Pr _{μ} ( x )}{\sum _{y \in A} Pr _{μ} ( y )} .

μ ∣ A Pr (x) = \frac{Pr _{μ} ( x )}{\sum _{y \in A} Pr _{μ} ( y )} .

H (μ, μ^{'}) = \frac{1}{2} x \in Ω \sum (μ Pr (x) - μ^{'} Pr (x))^{2} = (1 - x \in Ω \sum μ Pr (x) μ^{'} Pr (x))

H (μ, μ^{'}) = \frac{1}{2} x \in Ω \sum (μ Pr (x) - μ^{'} Pr (x))^{2} = (1 - x \in Ω \sum μ Pr (x) μ^{'} Pr (x))

d (μ, μ^{'}) \leq 2 H (μ, μ^{'}) \leq 2 d (μ, μ^{'})

d (μ, μ^{'}) \leq 2 H (μ, μ^{'}) \leq 2 d (μ, μ^{'})

H (μ, μ^{'})^{2} \leq i = 1 \sum n H (μ_{i}, μ_{i}^{'})^{2} .

H (μ, μ^{'})^{2} \leq i = 1 \sum n H (μ_{i}, μ_{i}^{'})^{2} .

d (μ, μ^{'} ∣ A) := \frac{1}{2} x \in Ω \sum μ ∣ A Pr (x) - μ^{'} ∣ A Pr (x) .

d (μ, μ^{'} ∣ A) := \frac{1}{2} x \in Ω \sum μ ∣ A Pr (x) - μ^{'} ∣ A Pr (x) .

μ^{(i)} Pr (x) = X \sim μ Pr [(X_{1}, X_{2}, \dots, X_{i}) = (x_{1}, x_{2}, \dots, x_{i})] .

μ^{(i)} Pr (x) = X \sim μ Pr [(X_{1}, X_{2}, \dots, X_{i}) = (x_{1}, x_{2}, \dots, x_{i})] .

μ_{i} ∣ w Pr (x) = X \sim μ Pr [X_{i} = x ∣ k = 1 ⋀ j X_{k} = w_{k}] .

μ_{i} ∣ w Pr (x) = X \sim μ Pr [X_{i} = x ∣ k = 1 ⋀ j X_{k} = w_{k}] .

d (μ_{i}, μ_{i}^{'} ∣ w) = \frac{1}{2} x \in Σ \sum μ_{i} ∣ w Pr (x) - μ_{i}^{'} ∣ w Pr (x)

d (μ_{i}, μ_{i}^{'} ∣ w) = \frac{1}{2} x \in Σ \sum μ_{i} ∣ w Pr (x) - μ_{i}^{'} ∣ w Pr (x)

E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] = w \in Σ^{i - 1} \sum μ^{i - 1} Pr (w) d (μ_{i}, μ_{i}^{'} ∣ w) .

E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] = w \in Σ^{i - 1} \sum μ^{i - 1} Pr (w) d (μ_{i}, μ_{i}^{'} ∣ w) .

d (μ, μ^{'}) \leq d (μ_{1}, μ_{1}^{'}) + i = 2 \sum n E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)]

d (μ, μ^{'}) \leq d (μ_{1}, μ_{1}^{'}) + i = 2 \sum n E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)]

2 d (μ^{(i)}, μ^{' (i)})

2 d (μ^{(i)}, μ^{' (i)})

w \in Σ^{i} \sum

w \in Σ^{i} \sum

d (μ^{(i)}, μ^{' (i)}) \leq d (μ^{(i - 1)}, μ^{' (i - 1)}) + w \in Σ^{i - 1} \sum μ^{(i - 1)} Pr (w) d (μ_{i}, μ_{i}^{'} ∣ w)

d (μ^{(i)}, μ^{' (i)}) \leq d (μ^{(i - 1)}, μ^{' (i - 1)}) + w \in Σ^{i - 1} \sum μ^{(i - 1)} Pr (w) d (μ_{i}, μ_{i}^{'} ∣ w)

2^{c - 1} \leq {i \in [n] ∣ E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] \geq \frac{ϵ}{2 ^{c} H ( n )}}

2^{c - 1} \leq {i \in [n] ∣ E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] \geq \frac{ϵ}{2 ^{c} H ( n )}}

E_{w \sim μ^{(i_{1} - 1)}} [d (μ_{i_{1}}, μ_{i_{1}}^{'} ∣ w)] \geq E_{w \sim μ^{(i_{2} - 1)}} [d (μ_{i_{2}}, μ_{i_{2}}^{'} ∣ w)] \geq E_{w \sim μ^{(i_{n} - 1)}} [d (μ_{i_{n}}, μ_{i_{n}}^{'} ∣ w)]

E_{w \sim μ^{(i_{1} - 1)}} [d (μ_{i_{1}}, μ_{i_{1}}^{'} ∣ w)] \geq E_{w \sim μ^{(i_{2} - 1)}} [d (μ_{i_{2}}, μ_{i_{2}}^{'} ∣ w)] \geq E_{w \sim μ^{(i_{n} - 1)}} [d (μ_{i_{n}}, μ_{i_{n}}^{'} ∣ w)]

E_{w \sim μ^{(i_{k} - 1)}} [d (μ_{i_{k}}, μ_{i_{k}}^{'} ∣ w)] \geq ϵ / (k H (n))

E_{w \sim μ^{(i_{k} - 1)}} [d (μ_{i_{k}}, μ_{i_{k}}^{'} ∣ w)] \geq ϵ / (k H (n))

{i \in [n] ∣ E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] \geq \frac{ϵ}{2 ^{c} H ( n )}} \geq k \geq 2^{c - 1} .

{i \in [n] ∣ E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] \geq \frac{ϵ}{2 ^{c} H ( n )}} \geq k \geq 2^{c - 1} .

d (μ, μ^{'}) \leq k = 1 \sum n E_{w \sim μ^{(i_{k} - 1)}} [d (μ_{i_{k}}, μ_{i_{k}}^{'} ∣ w)] < k = 1 \sum n ϵ / (k H (n)) \leq ϵ

d (μ, μ^{'}) \leq k = 1 \sum n E_{w \sim μ^{(i_{k} - 1)}} [d (μ_{i_{k}}, μ_{i_{k}}^{'} ∣ w)] < k = 1 \sum n ϵ / (k H (n)) \leq ϵ

k = 0 \sum ℓ_{j} \tilde{O} (\frac{1}{ϵ _{(j, k)}^{2}}) O (2^{k} k^{2})

k = 0 \sum ℓ_{j} \tilde{O} (\frac{1}{ϵ _{(j, k)}^{2}}) O (2^{k} k^{2})

= k = 0 \sum ℓ_{j} \tilde{O} (\frac{k ^{2}}{2 ^{k} ϵ _{j}^{2}})

k = 0 \sum ℓ_{j} \tilde{O} (\frac{k ^{2}}{2 ^{k} ϵ _{j}^{2}}) = \tilde{O} (\frac{1}{ϵ _{j}^{2}}) k = 0 \sum ℓ_{j} O (\frac{k ^{2}}{2 ^{k}}) = \tilde{O} (\frac{1}{ϵ _{j}^{2}})

k = 0 \sum ℓ_{j} \tilde{O} (\frac{k ^{2}}{2 ^{k} ϵ _{j}^{2}}) = \tilde{O} (\frac{1}{ϵ _{j}^{2}}) k = 0 \sum ℓ_{j} O (\frac{k ^{2}}{2 ^{k}}) = \tilde{O} (\frac{1}{ϵ _{j}^{2}})

j = 1 \sum l o g n + 1 \frac{4 n}{2 ^{j}} \tilde{O} (\frac{1}{ϵ _{j}^{2}})

j = 1 \sum l o g n + 1 \frac{4 n}{2 ^{j}} \tilde{O} (\frac{1}{ϵ _{j}^{2}})

j = 1 \sum l o g n + 1 \frac{4 n}{2 ^{j}} k = 0 \sum ℓ_{j} (2^{k + 2} (k + 3)^{2} δ_{k})

j = 1 \sum l o g n + 1 \frac{4 n}{2 ^{j}} k = 0 \sum ℓ_{j} (2^{k + 2} (k + 3)^{2} δ_{k})

= 16 δ^{'} j = 1 \sum l o g n + 1 \frac{n}{2 ^{j}} k = 0 \sum ℓ_{j} 2^{k}

< 16 δ^{'} j = 1 \sum l o g n + 1 \frac{n}{2 ^{j}} 2^{ℓ_{j} + 1}

= 64 δ^{'} j = 1 \sum l o g n + 1 \frac{n}{2 ^{j}} \frac{2 ^{j} H ( n )}{ϵ}

< \frac{64 δ ^{'} n ( lo g n ) ^{2}}{ϵ} = δ [∵ δ^{'} = δ ϵ /64 n (lo g n)^{2} \mbox b y s t e p \ref s t e p : d e l t a]

τ_{c} = d e f {i \in [n] ∣ E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] \geq \frac{ϵ}{2 ^{c} H ( n )}}

τ_{c} = d e f {i \in [n] ∣ E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] \geq \frac{ϵ}{2 ^{c} H ( n )}}

Γ_{i, k} = d e f {w \in Σ^{i - 1} ∣ d (μ_{i}, μ_{i}^{'} ∣ \land_{j = 1}^{i - 1} X_{j} = w_{j}) < \frac{2 ^{k - 1} ϵ}{2 ^{c} H ( n )}}

Γ_{i, k} = d e f {w \in Σ^{i - 1} ∣ d (μ_{i}, μ_{i}^{'} ∣ \land_{j = 1}^{i - 1} X_{j} = w_{j}) < \frac{2 ^{k - 1} ϵ}{2 ^{c} H ( n )}}

w \sim μ Pr [d (μ_{i}, μ_{i}^{'} ∣ w^{i - 1}) \geq 2^{k - 1} ϵ_{c}] \geq \frac{1}{2 ^{k} ( k + 3 ) ^{2}}

w \sim μ Pr [d (μ_{i}, μ_{i}^{'} ∣ w^{i - 1}) \geq 2^{k - 1} ϵ_{c}] \geq \frac{1}{2 ^{k} ( k + 3 ) ^{2}}

E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] = w \in Σ^{i - 1} \sum μ^{i - 1} Pr (w) d (μ_{i}, μ_{i}^{'} ∣ w) \geq \frac{ϵ}{2 ^{c} H ( n )}

E_{w \sim μ^{(i - 1)}} [d (μ_{i}, μ_{i}^{'} ∣ w)] = w \in Σ^{i - 1} \sum μ^{i - 1} Pr (w) d (μ_{i}, μ_{i}^{'} ∣ w) \geq \frac{ϵ}{2 ^{c} H ( n )}

B_{k}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Privacy-Preserving Technologies in Data

Full text

Property Testing of Joint Distributions using Conditional Samples

Rishiraj Bhattacharyya NISER Bhubaneswar, HBNI, India, [email protected]

Sourav Chakraborty Chennai Mathematical Institute Chennai, India, and CWI Amsterdam, The Netherlands. [email protected]

Abstract

In this paper, we consider the problem of testing properties of joint distributions under the Conditional Sampling framework. In the standard sampling model, the sample complexity of testing properties of joint distributions is exponential in the dimension, resulting in inefficient algorithms for practical use. While recent results achieve efficient algorithms for product distributions with significantly smaller sample complexity, no efficient algorithm is expected when the marginals are not independent.

We initialize the study of conditional sampling in the multidimensional setting. We propose a subcube conditional sampling model where the tester can condition on an (adaptively) chosen subcube of the domain. Due to its simplicity, this model is potentially implementable in many practical applications, particularly when the distribution is a joint distribution over $\Sigma^{n}$ for some set $\Sigma$ .

We present algorithms for various fundamental properties of distributions in the subcube-conditioning model and prove that the sample complexity is polynomial in the dimension $n$ (and not exponential as in the traditional model). We present an algorithm for testing identity to a known distribution using $\tilde{\mathcal{O}}(n^{2})$ -subcube-conditional samples, an algorithm for testing identity between two unknown distributions using $\tilde{\mathcal{O}}(n^{5})$ -subcube-conditional samples and an algorithm for testing identity to a product distribution using $\tilde{\mathcal{O}}(n^{5})$ -subcube-conditional samples.

The central concept of our technique involves an elegant chain rule which can be proved using basic techniques of probability theory yet powerful enough to avoid the curse of dimensionality.

1 Introduction

Property Testing of Distributions. The boom of Big Data Analytics has rejuvenated the well-studied area of hypothesis testing over unknown distributions. In Computer Science, the study of this type of problems was initiated by Batu, Fortnow, Rubinfeld, Smith, and White [BFR*+*13] under the framework of “Property Testing” [GGR98, RS96] In this framework, the “tester” draws independent samples from the distribution, and decides whether the distribution satisfies a specific property $\mathcal{P}$ (null hypothesis) or is far from any distribution that satisfies $\mathcal{P}$ (alternate hypothesis).

Several properties of probability distributions have been studied in this framework. Testing whether the distribution is uniform [BFF*+*01a, GR11, oCDVV14], testing identity between two unknown distributions (taking samples from both the distributions) [BFR*+*13, LRR13], testing independence of marginals of product distributions [BFF*+*01a] , estimating entropy [BDKR05] are a few of the numerous problems that have been studied in the literature. See [Can15b] for a survey on results related to distribution testing.

Unfortunately, from the modern data analytics point of view, the traditional framework of sampling yields impractical sample complexity. For example, testing if a distribution over a set of $n$ elements is uniform requires $\Omega(\sqrt{n})$ samples from the distribution. The other problems mentioned above have sample complexity at least this high and in some cases, almost linear in $n$ [RRSS09, VV11, Val11].

Conditional Sampling

To remedy this situation, Chakraborty et al. [CFGM16] and Canonne, Ron, and Servedio [CRS15] proposed a different model called conditional sampling, which has emerged as a powerful tool for testing properties of probability distributions. In this model, the testers are allowed to sample according to the distribution conditioned on any specific subset of the domain. If the distribution, $\mu$ , is over the domain $\Sigma$ , the tester can submit any subset $S\subseteq\Sigma$ and receive a sample $i\in S$ with probability $\mu(i)/\sum_{j\in S}\mu(j)$ , where $\mu(i)$ is the probability of $i$ occurring when a sample is drawn from the distribution $\mu$ .

[CFGM16, CRS15] proved that in the conditional sampling model, testing uniformity, testing identity to a known distribution, and testing any label-invariant property of distributions is easier than with the unconditional sampling model. Specifically, one can get an algorithm for testing uniformity using $\tilde{O}(1/\epsilon^{2})$ conditional samples (conditioning on arbitrary subsets of size $2$ ) [CRS15] . Falahatgar et al. [FJO*+*15], improving an upper bound of $\tilde{\mathcal{O}}(1/\epsilon^{4})$ in [CRS15], showed that testing identity to a known distribution could also be done using $\tilde{\mathcal{O}}(1/\epsilon^{2})$ conditional samples. They also showed that there exists an algorithm to test identity between two unknown distributions on $\Sigma$ using $\tilde{\mathcal{O}}(\log\log|\Sigma|/\epsilon^{5})$ conditional samples. In [ACK15], Acharya, Canonne, and Kamath showed a lower bound of $\Omega(\sqrt{\log\log n})$ for testing the equivalence of two unknown distributions.

In the conditional sampling model, the sample complexity depends on the structure of the condition, i.e., the structure of the subsets (of the domain) on which the distribution is conditioned for drawing samples. Naturally, if there is no restriction on the condition, the tester can sample conditioned on arbitrary subsets, and the sample complexity improves. In [CRS15], the authors presented an algorithm for testing whether a distribution over $\{1,\dots,n\}$ is uniform, with sample complexity $\tilde{\Theta}(1/\epsilon^{2})$ when conditioning on arbitrary subsets of size $2$ . However, when the condition set was structured and restricted to intervals, they proved a lower bound of sample complexity $\Omega\left(\frac{\log n}{\log\log n}\right)$ . In [Can15a], Canonne showed that conditioning on interval improves the query complexity of monotonicity testing. Hence it is important to consider the plausible restrictions on the conditions arising from the structure of the domain.

While [CRS15] studied some of the restrictions of the conditions, there are many more restrictions, which arise from the structure of the domain and/or arise from other applications, which are yet to be studied. One such important case is when the domain is a Cartesian product of set and one is allowed to condition on the Cartesian product of subsets, but not on arbitrary subsets of the domain.

Testing Joint Distributions: Subcube Conditioning

In practice, data are often multi-dimensional. In Cryptography, the keys are often defined over $\{0,1\}^{n}$ . Solutions to SAT formulae are over $\{0,1\}^{n}$ as well. On the other hand, the Lottery Tickets are defined over $[m]^{n}$ for some $m\in\mathbb{N}$ (each ticket contains $n$ numbers, each from the set $[m]$ ). Data analysts often get data of million dimensions (features). With the higher dimension, comes the “curse of dimensionality.” The sample complexity of the testers is exponential in dimension [ADK15, BFF*+*01b, DK16], prohibiting practical applications. Very recently, [DDK18] considered testing higher dimensional structured distributions modelled using Markov Random Fields and achieved polynomial (in the dimension) sample complexity under the Ising model. [CDKS17, DP17] considered testing properties of structured distributions using the probabilistic graphical model and achieved sublinear complexity for certain properties of Bayesian networks. However, all these results assume the distribution is structured and has certain properties. But for arbitrary distributions, testing with practical complexity remains a big concern.

One can be hopeful that using conditional sampling, testing properties of arbitrary joint distributions with practical complexity can be achieved. In that case, the assumptions are imposed on the sampling model. Finding a correct and natural sampling model is a challenge in itself. While joint distributions can also be viewed as a distribution over a larger domain, the marginals’ domains may differ. Hence sampling conditioned on arbitrary subsets (as used in [CRS15, CFGM16]) may not be feasible in real life.

In [CRS15], authors also considered structured conditioning, namely Icond (conditioning over an interval) and PCond (conditioning over a pair of points). Icond requires the domain to be well ordered. Moreover, for both cases, one should be able to sample from arbitrary intervals. For a joint distribution, the natural ordering of the domain is a pair; it involves ordering in the dimensions coupled with ordering in the individual domains. For such an ordering, an arbitrary interval is required for the Icond tester need not be succinctly encodable and remains impractical.

1.1 Our Results

In this paper, we propose the subcube conditioning model and analyze property testing of joint distributions in that model.

Informally, the subcube conditioning model can be described in the following way. Let $\Sigma^{n}$ be the domain of the distribution $\mu$ . The Subcube Conditioning Oracle accepts $A_{1},A_{2},\cdots,A_{n}\subseteq\Sigma$ and constructs $S=A_{1}\times A_{2}\times\cdots\times A_{n}$ as the condition set. The oracle returns a vector $x=(x_{1},x_{2},\cdots,x_{n})$ , where each $x_{i}\in A_{i}$ , with probability $\mu(x)/(\sum_{w\in S}\mu(w))$ . If $\mu(S)=0$ , we assume the oracle returns an element from $S$ uniformly at random. We will call these kinds of samples subcube-conditional-samples and the corresponding sample complexity subcube-conditional-sample complexity. There is no restriction on the individual $A_{i}$ s. They may be unstructured or structured as pairs or intervals as used in [CRS15, CFGM16].

Motivation of SubCube conditioning

We believe the subcube conditional sampling model is mathematically interesting in itself. Every Boolean function can be modelled as a subgraph of a hypercube. Testing a property of a Boolean function translates to testing some property of the resulting subgraph. The conditional sampling model is equivalent to sampling over the edges of such subgraph, i.e., fixing some vertices, sampling over the edges, and checking the properties of the adjacent vertices. We argue sampling over the hypercube arises naturally in many areas.

Database Query. A typical “SELECT” query to a database often looks like SELECT field1 WHERE field2= cond1 and field2 = cond2. The response to such a query is all the tuples which satisfy cond1 and cond2. Sampling over such tuples is indeed conditional sampling.

Side Channel Cryptanalysis. In modern Cryptography, schemes are often “proven” secure (no efficient attack algorithm exists) under the assumption that the keys, internal randomness, and internal memory are inaccessible to the adversary. However, in practice, Cryptographic schemes are deployed in a wide variety of devices, specifically hand-held devices and smart cards. This situation leads to the “side channel attacks” where tampering with the keys or internal randomness is feasible. Specifically, the cryptanalytic techniques of fault attacks fix/modify some bits and test the resulting distributions. The subcube conditioning model captures this attack scenario (fixing some bits and testing on the resulting subcube).

Our results in this paper can be viewed as proof that “indistinguishability” with uniform (in fact any known distribution) cannot be proven if an adversary can tamper with the internal state. 111While this result is folklore in Cryptography, the subcube conditioning may be considered as the benchmark model while analyzing the efficiency of a fault attack.

Verification of Random SAT solutions. In software verification and related areas, random solutions to SAT problems are often used as a backbone. However, testing whether the solution that one algorithm generates is indeed uniform is a very important problem. Unfortunately, the standard algorithms require impractical complexity. Recently, Chakraborty et al. [CM16] used the conditional sampling model to get a practically deployable solution. The model of subcube conditioning would be very effective to this problem as one natural conditioning technique is to fix some variables of the SAT equation and then test the solution’s distribution.

Recently [GTZ17], has significantly improved the runtimes of sublinear algorithms for k means clustering and weight estimation of minimum spanning tree using conditional samples. We believe the subcube conditioning can be used in this setting as well.

We remark that the idea of subcube conditioning has also been mentioned in the literature related to property testing. In fact, analysis of joint distributions using subcube conditioning was posed as a natural open problem in [CRS15].

Our Results

We focus on four fundamental properties of distributions: given two joint distributions $\mu$ and $\mu^{\prime}$ over $\Sigma^{n}$ we would like to test, using subcube-conditional-samples, if (a) $\mu$ is uniform, (b) $\mu$ is identical to $\mu^{\prime}$ (when $\mu^{\prime}$ is known in advance), (c) $\mu$ is identical to $\mu^{\prime}$ (when $\mu^{\prime}$ is not known in advance and has to be accessed using conditional samples), and (d) $\mu$ is a product distribution. We have the following four theorems:

Theorem 1.1.

*(Informal) Let $\mu$ be a probability distribution over $\Sigma^{n}$ . There exists an algorithm for testing if $\mu$ is uniform, using $\tilde{\mathcal{O}}(n^{2}/\epsilon^{2})$ subcube-conditional-samples.222 $\tilde{\mathcal{O}}$ hides a polynomial function of $\log n$ and $\log(1/\epsilon)$ . *

Theorem 1.2.

*(Informal)

Let $\mu$ be a known probability distribution over the set $\Sigma^{n}$ . Let $\mu^{\prime}$ be an unknown distribution over $\Sigma^{n}$ . There exists an algorithm to test identity of $\mu^{\prime}$ with $\mu$ using $\tilde{\mathcal{O}}(n^{2}/\epsilon^{2})$ subcube-conditional-samples. 2*

Theorem 1.3.

*(Informal)

Let $\mu,\mu^{\prime}$ be unknown distributions over $\Sigma^{n}$ . There exists an algorithm to test if $\mu^{\prime}$ and $\mu$ are identical using $\tilde{\mathcal{O}}(n^{5}\log\log|\Sigma|/\epsilon^{5})$ subcube-conditional-samples from both $\mu$ and $\mu^{\prime}$ . 2*

Theorem 1.4.

*(Informal)

Let $\mu$ be a probability distribution over the set $\Sigma^{n}$ . There exists an algorithm to test whether $\mu$ is a product distribution using $\tilde{\mathcal{O}}(n^{5}\log\log|\Sigma|/\epsilon^{5})$ subcube-conditional-samples. 2*

Comparison to Previous Results

While conditional sampling has been studied in a number of articles in the recent past, and although subcube conditioning is a very natural model (that is also discussed in [CRS15]), as far as we understand, this is the first formal study on subcube conditioning. One of the main reasons for the lack of literature in this area is that the classical setting was not well studied either, till recently. Recently in [CDKS17] Canonne et al. studied the problem of testing properties of joint distributions over the domain $\Sigma^{n}$ . For example, for the fundamental problem of testing if the distribution is uniform, they observed that if the distribution is a product distribution (that is, the $n$ marginals are independent), then one needs $\Theta(\sqrt{n})$ samples. But if the distributions are not independent, then in the worst case, $\Theta(\Sigma^{n/2})$ samples are necessary.

In comparison, we show that only $\tilde{\mathcal{O}}(n^{2})$ subcube-conditional samples are necessary in the worst case, so we have an exponential improvement in the sample complexity. Also, it is interesting to note that the sample complexity for uniformity testing in the subcube model is independent of $|\Sigma|$ . This shows the power of subcube conditional samples and gets the query complexity to a more practical level. Also, from [CDKS17] we know that $\Omega(\sqrt{n})$ conditional samples are necessary since, in the case of product distributions, conditional samples give no additional power over standard samples.

A list of our results and comparison to previous results on standard sampling algorithms are given in Table 1.

Overview of Our Technique

Let us start with the problem of testing if a given distribution is uniform. Let $\mu$ be a distribution over $\Sigma^{n}$ with marginals $\mu_{1},\dots,\mu_{n}$ .

The simplest case is when $\mu$ is a product of $n$ independent distributions. That is, $\mu_{i}$ ’s are independent but not necessarily identical. But if $\mu$ is $\epsilon$ -far from uniform , one expects to find at least one $\mu_{i}$ which is $\epsilon/n$ -far from uniform. Then one can use any tester over $\Sigma$ if $\mu_{i}$ is far from uniform, which should make at most poly( $n$ ) traditional queries. In fact, when $\mu$ is a product distribution over $\{0,1\}^{n}$ , [CDKS17] show that the uniformity and identity can be tested using $\mathcal{O}(\sqrt{n}/\epsilon^{2})$ unconditional samples. As the marginals of $\mu$ are independent and over $\{0,1\}^{n}$ , subcube-conditional-sampling is equivalent to unconditional sampling followed by projections, and hence subcube-conditional samples do not give any additional power in this setting.

But if the $\mu_{i}$ ’s are not independent, then it is possible that all the individual marginals are uniform, but still, the $\mu$ is $\epsilon$ -far from uniform. As has been observed in [CDKS17], any algorithm (using unconditional sampling) requires $\exp(n)$ queries. To circumvent this barrier, we need to use conditional samples. We define a notion of “conditional distance”. We show that there exists at least one $i\in[n]$ such that the expected “conditional distance” of $i$ th marginal from uniform is more than $\epsilon/\mbox{poly($ n $)}$ . Thus it is enough to test for all $i$ if the $i$ th marginal is $\epsilon/\mbox{poly($ n $)}$ -far from uniform. We can use the testers from [CRS15, CFGM16] to test exactly that condition using poly( $n$ ) subcube-conditional samples. The central idea of the correctness of the algorithm is the correct definition of the “conditional distance” and the “chain rule” that proves that such an $i$ exists. Although the proof of the “chain rule” (given in Section 3) is simple in hindsight, it is a powerful tool that acts as the central backbone for all our upper-bound proofs. Moreover, it gives the flexibility of using an adaptive or non-adaptive tester over $\Sigma$ .

1.2 Organization of the paper

In Section 2, we define the notion of conditional distance and SubCube Conditioning. The chain rule is described in Section 3. In Section 4 we present the identity testers and the derived uniformity tester. In Section 5, the tester for testing identity between two unknown distributions is presented. In Section 6, the tester for the independence of marginals is described. In Appendix A we present a lower bound of $n^{1/4}$ for testing identity to the uniform distribution. This lower bound was proved independently of [CDKS17] and although our lower bound is weaker than their lower bound of $\sqrt{n}$ , we feel that our techniques can be of independent interest.

2 Notations and Preliminaries

If $S$ is a set, $|S|$ denotes the size of the set. If $x$ is a vector of length $n$ , $x_{i}$ denotes the $i^{th}$ element of $x$ . $x^{(i)}$ denotes the substring of first $i$ elements of $x$ ; $x^{(i)}=(x_{1},x_{2},\cdots,x_{i})$ . We denote the $n$ -th harmonic number by $H(n)$ .

For any set $\Omega$ , we denote by $\mathcal{U}_{\Omega}$ the uniform distribution with support $\Omega$ . In most cases, the support of the distribution would be clear from the context and in that case, we would drop the subscript and use $\mathcal{U}$ as the uniform distribution over the support in question.

If $\mu$ is a distribution with support $\Omega$ , for any $x\in\Omega$ , we will denote by $\Pr_{\mu}(x)$ the probability the $x$ occurs when a random sample is drawn from $\Omega$ according to $\mu$ . If $\mu$ is a joint distribution, $\mu_{i}$ denotes the $i^{th}$ marginal distribution of $\mu$ .

If $\mu$ is a distribution over $\Sigma^{n}$ with the marginals $\mu_{1},\dots,\mu_{n}$ and if the marginals are independent (that is, $\mu$ is a product distribution) then we would write $\mu=\mu_{1}\otimes\dots\otimes\mu_{n}$ .

Total Variation Distance. Let $\mu,\mu^{\prime}$ be two distributions with support $\Omega$ . The variation distance between $\mu$ and $\mu^{\prime}$ denoted by $d(\mu,\mu^{\prime})$ is defined as

[TABLE]

We say $\mu$ and $\mu^{\prime}$ are $\epsilon$ -far (or $\mu$ is $\epsilon$ -far from $\mu^{\prime}$ ), when $d(\mu,\mu^{\prime})\geq\epsilon.$

If $\mu$ is a distribution with support $\Omega$ and $A\subseteq\Omega$ , then by $(\mu\mid A)$ , we denote the distribution over the support $A$ . For any $x\in A$ , the probability that $x$ occurs when a random sample is drawn from $A$ (according to the distribution $(\mu\mid A)$ ) is given by

[TABLE]

Hellinger Distance. Let $\mu,\mu^{\prime}$ be two distributions with support $\Omega$ . The Hellinger distance between $\mu$ and $\mu^{\prime}$ denoted by $H(\mu,\mu^{\prime})$ is defined as

[TABLE]

Hellinger distance has some nice properties and is useful for bounding lower and upper bounding variation distance.

[TABLE]

Also for any two product distributions $\mu=\mu_{1}\otimes\dots\otimes\mu_{n}$ and $\mu^{\prime}=\mu^{\prime}_{1}\otimes\dots\otimes\mu^{\prime}_{n}$

[TABLE]

Conditional Distance. Let $\mu,\mu^{\prime}$ be two distributions over $\Omega$ . Let $A\subseteq\Omega$ . The variation distance between $\mu$ and $\mu^{\prime}$ conditioned on $A$ (denote by $d(\mu,\mu^{\prime}|A)$ ) is defined as

[TABLE]

We say $\mu$ and $\mu^{\prime}$ are $\epsilon$ -far, conditioned on $A$ , when $d(\mu,\mu^{\prime}|A)\geq\epsilon.$

Subcube Conditioning. In this paper, we work with joint distributions; $\Omega=\Sigma^{n}$ for some set $\Sigma$ . We consider conditional distance under the condition on $A=A_{1}\times A_{2}\times\dots\times A_{n}$ where each $A_{i}\subseteq\Sigma$ .

Let $\mu$ be a distribution over $\Sigma^{n}$ and $X=(X_{1},X_{2},\dots,X_{n})$ be a random variable distributed according to $\mu$ . $\mu^{(i)}$ denotes the distribution over $\Sigma^{i}$ where for every $x\in\Sigma^{i}$ ,

[TABLE]

Let $w\in\Sigma^{j}$ for some $j<i$ . $\mu_{i}\mid w$ denotes the marginal distribution $\mu_{i}$ when the first $j$ random variables are fixed to $w$ .

[TABLE]

Definition 2.1.

Let $\mu,\mu^{\prime}$ be two distributions over $\Sigma^{n}$ . The conditional marginal distance of $\mu_{i}$ and $\mu_{i}$ conditioned on $w$ is given by

[TABLE]

The average conditional distance between $\mu_{i}$ and $\mu^{\prime}_{i}$ is defined by

[TABLE]

The SubCube Condition Model

Let $\mu$ be a distribution over $\Sigma^{n}$ . A subcube conditional oracle for $\mu$ , denoted $\textsc{SubCond}_{\mu}$ , takes as input a sequence of sets $\{A_{i}\}_{i\in[n]}$ , $A_{i}\subseteq\Sigma$ . Let $A$ be the product set $A_{1}\times\dots\times A_{n}$ . The oracle returns an element $x\in\Sigma^{n}$ with probability $\frac{\Pr_{\mu}[x]}{\sum_{x\in A}\Pr_{\mu}[x]}$ independently of all previous calls to the oracle.

An $(\epsilon,\delta)\mbox{-}\textsc{SubCond}$ tester for a property $\mathcal{P}$ with conditional sample complexity $t$ is a randomized algorithm, that receives $0<\epsilon,\delta<1$ , $n\in\mathbb{N}$ and oracle access to $\textsc{SubCond}_{\mu}$ , and operates as follows.

In every iteration, the algorithm (possibly adaptively) generates a set $A=A_{1}\times A_{2}\times\cdots\times A_{n}\subseteq\Sigma^{n}$ , based on the transcript and its internal coin tosses, and calls the conditional oracle with $A$ to receive an element $x$ , drawn according to the distribution $\mu$ conditioned on $A$ . 2. 2.

Based on the received elements and its internal coin tosses, the algorithm accepts or rejects the distribution $\mu$ . 3. 3.

The algorithm makes at most $t$ queries to $\textsc{SubCond}_{\mu}$ , where $t$ can depend on $\epsilon,\delta,\Sigma$ and $n$ .

If $\mu$ satisfies $\mathcal{P}$ , then the algorithm must accept with probability at least $1-\delta$ , and if $\mu$ is $\epsilon$ -far from all distributions satisfying $\mathcal{P}$ , then the algorithm must reject with probability at least $1-\delta$ .

We will call such a tester an $(\epsilon,\delta)\mbox{-}\textsc{SubCond}$ $\mathcal{P}$ -tester. For example an $(\epsilon,\delta)\mbox{-}\textsc{SubCond}$ Uniformity-tester is an $(\epsilon,\delta)\mbox{-}\textsc{SubCond}$ tester that tests if the given distribution is uniform, an $(\epsilon,\delta)\mbox{-}\textsc{SubCond}$ Identity-tester is an $(\epsilon,\delta)\mbox{-}\textsc{SubCond}$ tester that tests if the given distribution is identical to a known distribution and an $(\epsilon,\delta)\mbox{-}\textsc{SubCond}$ Product-tester is an $(\epsilon,\delta)\mbox{-}\textsc{SubCond}$ tester that tests if the given distribution is a product distribution or far from all the product distributions.

3 Chain Rule of Conditional Distances

Let $\mu$ and $\mu^{\prime}$ be two distributions over $\Sigma^{n}$ , and let $X=(X_{1},X_{2},\dots,X_{n})$ and $X^{\prime}=(X^{\prime}_{1},X^{\prime}_{2},\dots,X^{\prime}_{n})$ be the corresponding random variables. For any $1\leq i\leq n$ , we denote by $\mu_{i}$ and $\mu^{\prime}_{i}$ the distributions of the $i$ th marginals of $\mu$ and $\mu^{\prime}$ respectively.

Lemma 3.1 (Chain Rule of Conditional Distances).

Let $\mu$ and $\mu^{\prime}$ be two distributions over $\Sigma^{n}$ , and let $X=(X_{1},X_{2},\dots,X_{n})$ and $X^{\prime}=(X^{\prime}_{1},X^{\prime}_{2},\dots,X^{\prime}_{n})$ be two random variables with distribution $\mu$ and $\mu^{\prime}$ respectively. Then the following holds.

[TABLE]

Proof of Lemma 3.1:.

Let $w=(w_{1},w_{2},\dots,w_{n})\in\Sigma^{n}$ .

Let $2\leq i\leq n$ . Recall that $w^{(i)}$ denotes the substring of first $i$ elements of $w$ .

[TABLE]

Now, the second term reduces to,

[TABLE]

The second equality follows from the fact that for each $w^{\prime}\in\Sigma^{i-i}$ , $\sum_{w_{i}\in\Sigma}\Pr[X_{i}^{\prime}=w_{i}|\wedge_{j=1}^{i-1}X_{j}^{\prime}=w^{\prime}_{j}]=1.$ 333If $w^{\prime}$ is outside of the support of $\mu^{\prime}$ , like in [CFGM16], we can define the conditional probability to be uniform over $\Sigma$ Hence,

[TABLE]

Solving the recursion, we get the lemma. ∎

Arranging the marginals by the increasing order of the average conditional distance, we get the immediate corollary.

Lemma 3.2.

If $d(\mu,\mu^{\prime})\geq\epsilon$ , then there exists a $c\leq\lceil\log n\rceil$ such that

[TABLE]

Proof of Lemma 3.2.

Without loss of generality let $i_{1},i_{2},\dots,i_{n}$ be indices such that

[TABLE]

We will need the following claim.

Claim 3.3.

There exists $k\in[n]$ such that

[TABLE]

Let $k$ be the index from Claim 3.3. We put $c=\lceil\log k\rceil$ to get $\epsilon/2^{c}H(n)\leq\epsilon/kH(n)$ . Clearly

[TABLE]

∎

Proof of Claim 3.3.

If no such $k$ exists, then

[TABLE]

which contradicts the distance assumption in Lemma 3.2. ∎

4 Testing Identity with a known distribution

In this section, we present an identity tester of Sample complexity $\tilde{\mathcal{O}}(n^{2}/\epsilon^{2})$ . We recall the following result proved in [FJO*+*15].

Lemma 4.1.

[FJO*+*15]** Let $\mu$ be a known distribution over $\Sigma$ . Given $0<\epsilon<1$ and $0<\delta<1$ and a distribution $\mu^{\prime}$ over $\Sigma$ there is an adaptive $(\epsilon,\delta)$ -SubCond Identity Tester with conditional sample complexity $\tilde{\mathcal{O}}(\frac{1}{\epsilon^{2}}\log(\frac{1}{\delta}))$ . In other words, there is a tester that draws $\tilde{\mathcal{O}}(\frac{1}{\epsilon^{2}}\log(\frac{1}{\delta}))$ conditional samples and

•

if $\mu=\mu^{\prime}$ , then the tester will accept with probability $(1-\delta)$ , and

•

if $d(\mu,\mu^{\prime})\geq\epsilon$ then the tester will reject with probability $(1-\delta)$ .

Let $\mu$ be a known distribution over $\Sigma^{n}$ , $\mu^{\prime}$ be an unknown distribution over $\Sigma^{n}$ that can be accessed via $\textsc{SubCond}_{\mu^{\prime}}$ oracle, and $\epsilon$ be the target distance. The following algorithm tests the identity of $\mu^{\prime}$ with $\mu$ . We use the identity tester BasicIDTester over $\Sigma$ guaranteed by Lemma 4.1 as a subroutine.

Theorem 4.2.

Given any $0<\epsilon<1$ , Algorithm 1 is an $(\epsilon,\frac{1}{3})$ -SubCond Identity Tester for joint distributions with conditional sample complexity of $\tilde{\mathcal{O}}(n^{2}/\epsilon^{2})$ where $\tilde{\mathcal{O}}$ hides a polynomial function of $\log n,\log\frac{1}{\epsilon}$ .

Note 4.3.

For any $0<\epsilon,delta<1$ , one can obtain an $(\epsilon,\delta)$ -SubCond Identity Tester by standard techniques of error reduction. The query complexity would increase by a factor of $\log(1/\delta)$ .

4.1 Proof of Theorem 4.2

Fix $\delta=\frac{1}{3}$ . In Algorithm 1, Step 14 queries BasicIDTester. BasicIDTester needs conditional samples for testing whether $d(\mu_{i},\mu^{\prime}_{i}\mid w^{(i-1)})\geq\epsilon_{(j,k)}$ . To answer a conditional query with condition $B\subseteq\Sigma$ for the distribution $\mu^{\prime}_{i}|w^{(i-1)}$ , we set $A_{j}=\{w_{j}\}$ for $j=1,2,\dots,i-1$ , $A_{i}=B$ , and $A_{j}=\Sigma$ for $j=i+1,\dots,n$ , and query the SubCond oracle with the condition $A$ . This correctly simulates the conditional oracle required by the underlying identity tester. Thus Algorithm 1 is a SubCond Tester.

4.1.1 Sample Complexity of Algorithm 1

By Lemma 4.1, a query to $\textsf{BasicIDTester}(\mu_{i}|w^{(i-1)},\mu^{\prime}_{i}|w^{(i-1)},\epsilon_{(j,k)},\delta_{k})$ requires $\tilde{\mathcal{O}}({1}/{\epsilon_{(j,k)}^{2}})$ samples. Here $\tilde{\mathcal{O}}$ hides polylogarithmic factors of $|\Sigma|,\epsilon_{(j,k)}$ including the factors due to $\log(1/\delta_{k})$ .

For each index in $S_{j}$ , the sample complexity is

[TABLE]

Here $\tilde{\mathcal{O}}$ hides some polylogarithmic function of $k$ and $1/\epsilon_{j}$ . As $k\leq\ell_{j}=\log\left(\frac{2}{\epsilon_{j}}\right)$ , the expression can be bounded as

[TABLE]

The last equality holds true as $\sum_{k\geq 0}\frac{k^{2}}{2^{k}}=6$ .

The size of $S_{j}$ is $\frac{4n}{2^{j}}$ . Adding over all possible $j$ , we get the total sample complexity

[TABLE]

4.1.2 Correctness of the Algorithm 1

Completeness. We will show that if $d(\mu,\mu^{\prime})=0$ , the algorithm will reject with probability at most $\delta$ .

Algorithm 1, rejects $\mu^{\prime}$ if there exists $i\in[n]$ and a sampled $w=(w_{1},\cdots,w_{n})\in\Sigma^{n}$ the underlying Identity Tester rejects in the Step 14.

Suppose $\mu$ and $\mu^{\prime}$ are identical. Then for all $w\in\Sigma^{i-1}$ , $\mu_{i}|w$ is identical to $\mu^{\prime}_{i}\mid w$ . For each query, BasicIDTester will reject in Step 14 with probability at most $\delta_{k}$ . By union bound, the probability that the algorithm will reject $\mu^{\prime}$ is at most

[TABLE]

Soundness. Now, we prove the soundness of the Algorithm 1. Let $\mu$ be a distribution over $\Sigma^{n}$ and $d(\mu,\mu^{\prime})\geq\epsilon$ . We shall show that Algorithm 1 rejects $\mu^{\prime}$ with a probability of at least $2/3$ .

Let

[TABLE]

Let $c\leq{\lceil{\log n}\rceil}$ be the integer guaranteed by Lemma 3.2, such that $|\tau_{c}|\geq 2^{c-1}$ . Note, $\ell_{c}=\lceil\log\left(\frac{2^{c+1}H(n)}{\epsilon}\right)\rceil$ . For each $i\in\tau_{c}$ , for each $k\in[\ell_{c}]\cup\{0\}$ define

[TABLE]

We require the following lemma based on Levin’s economical work investment strategy [Gol17].

Lemma 4.4.

Let $\mu$ be a distribution over $\Sigma^{n}$ , and $\mu$ is $\epsilon$ -far from uniform. Let $X=(X_{1},\cdots,X_{n})$ be a random variable with distribution $\mu$ . Let $w=(w_{1},w_{2},\cdots,w_{n})$ be a random sample drawn from $\Sigma^{n}$ according to the distribution $\mu$ . Let $\epsilon_{c}=\frac{\epsilon}{2^{c}H(n)}$ and $\ell_{c}=\lceil\log\left(\frac{2}{\epsilon_{c}}\right)\rceil$ .

Then for all $i\in\tau_{c}$ , there exists $k\in[\ell_{c}]\cup\{0\}$ ,

[TABLE]

( Proof of Lemma 4.4.).

From Lemma 3.2, for all index $i\in\tau_{c}$

[TABLE]

Fix $i\in\tau_{c}$ . Let us define

[TABLE]

By construction, $B_{\ell_{c}+1}=\emptyset$ . We shall prove that there exists $k\in[\ell_{c}]\cup\{0\}$ such that $\Pr_{w\sim\mu}[w\in B_{k}]\geq\frac{1}{2^{k}(k+3)^{2}}$ . Suppose, towards contradiction, for all $k\in[\ell_{c}]\cup\{0\}$ , $\Pr_{w\sim\mu}[w\in B_{k}]<\frac{1}{2^{k}(k+3)^{2}}$ . Then

[TABLE]

In the last inequality we used the fact that $\sum_{k\in[\ell_{c}]}\frac{1}{(k+2)^{2}}<\sum_{k\geq 0}\frac{1}{(k+2)^{2}}$ which is less than $1/2$ .

∎

By Lemma 4.4, there exists $0\leq k\leq\ell_{c}$ , such that,

[TABLE]

Let $S_{j}$ be the set of indices sampled in Step 3 in the $j^{th}$ iteration. If Algorithm 1 fails to reject $\mu^{\prime}$ , one of the following three cases happens.

No index from $\tau_{c}$ was sampled in $S_{j}$ . Specifically, $S_{c}\cap\tau_{c}=\emptyset$ . The probability of this event is

[TABLE] 2. 2.

For all index $i\in S_{c}\cap\tau_{c}$ , for each $k\in[\ell_{c}]\cup\{0\}$ , all the sampled $w$ ’s are from the set $\Gamma_{i,k}$ . The probability of this event is

[TABLE] 3. 3.

For all index $i\in S_{c}\cap\tau_{c}$ , for each $k\in[\ell_{c}]\cup\{0\}$ , for all the sampled $w\notin\Gamma_{i,k}$ , underlying identity tester fails to reject. The probability of such an event is at most $\delta^{\prime}$ , which is less than $1/100$ for $n\geq 2$ .

Hence, the probability that Algorithm 1 fails to reject $\mu^{\prime}$ is at most $e^{-2}+e^{-4}+1/100<1/3$ .

This completes the proof of Theorem 4.2. ∎

4.2 Uniformity Tester for Arbitrary Joint Distribution

If we set $\mu$ to be the uniform distribution, then Algorithm 1 gives us a Uniformity Tester. Hence, we get the following as a corollary of Theorem 4.2.

Theorem 4.5.

Given any $0<\epsilon<1$ , there exists an $(\epsilon,\frac{1}{3})$ -SubCond Uniformity Tester for any joint distribution with conditional sample complexity of $\tilde{\mathcal{O}}(n^{2}/\epsilon^{2})$ where $\tilde{\mathcal{O}}$ hides a polynomial function of $\log n,\log\frac{1}{\epsilon}$ .

5 Identity Testing between Unknown Joint Distributions

In this section, we present Algorithm 2 to test identity when both $\mu$ and $\mu^{\prime}$ are unknown. The first change, from Algorithm 1, we need to make is in Step 12. In this case, we can no longer sample on our own. However, we can query $\mu$ to get $w$ . Secondly, instead of Algorithm BasicIDTester, we need to use Algorithm BasicUnknown guaranteed by the following lemma.

Lemma 5.1.

[FJO*+*15]** Given $0<\epsilon<1$ and $0<\delta<1$ and distributions $\mu,\mu^{\prime}$ over $\Sigma$ there is an $(\epsilon,\delta)$ -Identity Tester with conditional sample complexity $\tilde{\mathcal{O}}(\frac{\log\log|\Sigma|}{\epsilon^{5}}\log(\frac{1}{\delta}))$ . In other words, there is a tester that draws $\tilde{\mathcal{O}}(\frac{\log\log|\Sigma|}{\epsilon^{5}}\log(\frac{1}{\delta}))$ independent conditional samples and

•

if $\mu=\mu^{\prime}$ , then the tester will accept with probability $(1-\delta)$ , and

•

if $d(\mu,\mu^{\prime})\geq\epsilon$ then the tester will reject with probability $(1-\delta)$ .

To prove the correctness of Algorithm 2, we note that, in the chain rule, the expectation is over only one distribution. Hence it is sufficient to (unconditionally) query only $\mu$ to get $w$ , and apply Lemma 3.2. The rest of the proof is exactly the same as in Section 4.

Sample Complexity of Algorithm 2

By Lemma 5.1, each invocation of BasicUnknown with parameter $\epsilon_{k}$ , $\delta_{k}$ requires $\tilde{\mathcal{O}}(\log\log|\Sigma|/\epsilon_{k}^{5})$ samples. As in the case for Algorithm 1, for each index in $S_{j}$ , the sample complexity is $\tilde{\mathcal{O}}(\log\log|\Sigma|/\epsilon^{5})$ . Hence, the total sample complexity of Algorithm 2 is

[TABLE]

Theorem 5.2.

Given $0<\epsilon<1$ , Algorithm 2 is an $(\epsilon,\frac{1}{3})$ -SubCond Identity Tester for two unknown joint distributions with conditional sample complexity of $\tilde{\mathcal{O}}\left(\frac{n^{5}\log\log|\Sigma|}{\epsilon^{5}}\right)$ where $\tilde{\mathcal{O}}$ hides a polynomial function of $\log n,\log\frac{1}{\epsilon}$ .

6 Testing Independence of Marginals

Let $\mu$ be a probability distribution over $\Sigma^{n}$ . In this section, we present an algorithm to test whether $\mu$ is a product distribution; i.e., whether all the marginals of $\mu$ are independent or $\mu$ is far from all the product distributions.

Define $\mu^{\prime}$ to be the product of marginals of $\mu$ .

[TABLE]

By definition, the marginal distributions $\mu^{\prime}_{i}$ are exactly the marginal distributions $\mu_{i}$ . If $\mu$ is $\epsilon$ -far from all the product distributions, it is $\epsilon$ -far from $\mu^{\prime}$ . Using the chain rule (Lemma 3.1),

[TABLE]

Therefore, we need to test whether there exists $i\in[n]$ , such that the marginal distribution $\mu_{i}$ is far (on average) from the conditional marginal distribution $\mu_{i}|w$ . As both $\mu_{i}$ and $\mu_{i}|w$ is distributed over $\Sigma$ , we can again use BasicUnknown tester from [FJO*+*15], where identity between two unknown distributions is tested using $\tilde{\mathcal{O}}(\log\log|\Sigma|/\epsilon^{5})$ sample complexity. The only thing left is to sample $w$ according to $\mu^{i-1}$ . Such a $w$ can be sampled by taking an unconditionally sampled string and selecting the first $i-1$ bit of that string. The rest of the algorithm is exactly the same as in Algorithm 2.

Theorem 6.1.

For any $0<\epsilon<1$ , there exists an $(\epsilon,\frac{1}{3})$ - SubCond Product Tester for joint distributions with conditional sample complexity of $\tilde{\mathcal{O}}\left(\frac{n^{5}\log\log|\Sigma|}{\epsilon^{5}}\right)$ , where $\tilde{\mathcal{O}}$ hides a polynomial function of $\log n,\log\left(\frac{1}{\epsilon}\right)$

The proof of Theorem 6.1 follows directly from Theorem 5.2, and the observation that in this particular case, the (conditional) samples for $\mu_{i}$ can be produced by conditioning only on the $i^{th}$ index of $\Sigma^{n}$ .

7 Conclusion

In this paper, we analyzed property testing of joint distributions in the conditional sampling model. We considered the natural subcube conditioning and presented testers to test uniformity, identity with a known distribution, identity with an unknown distribution, and independence of marginals of query complexity polynomial in the dimension, thus avoiding the curse of dimensionality.

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful suggestions and comments, which significantly improved the paper. In particular, the authors would like to thank the first reviewer of the ToCT submission for suggesting the use of Levin’s economic work strategy, which resulted in a speedup of all our algorithms by a factor of $n/\epsilon$ .

Rishiraj is supported by SERB ECR/2017/001974.

Appendix A A Weaker Lower Bound with Simple Proof

Theorem A.1.

For any $0\leq\epsilon\leq 1/2$ any $(\epsilon,1/3)-\textsc{SubCond}$ Uniformity-Tester has subcube-conditional sample complexity $\Omega(\sqrt[4]{n}/\sqrt{\epsilon})$ . The lower bound holds even for the case when the domain is $\{0,1\}^{n}$ and the given distribution is a product of $n$ independent (though not necessarily identical) distributions.

Proof.

Let $\mu$ be a product distributions over the domain $\{0,1\}^{n}$ with marginals $\mu_{1},\dots,\mu_{n}$ . So $\mu=\mu_{1}\otimes\dots\otimes\mu_{n}$ . Note that since the $\mu_{i}$ are independent, if $i\neq j$ then conditioning on $\mu_{i}$ does not affect the samples we get from a $\mu_{j}$ . Also, since the $\mu_{i}$ are all distributions over a two-element set (namely $\{0,1\}$ ), conditioning on any subset of $\{0,1\}$ also of no use. Thus drawing subcube-conditional-samples from $\mu$ is as good as drawing samples (without any conditioning) from $\mu$ .

So it is sufficient for us to prove that for any $0\leq\epsilon\leq 1/2$ any $(\epsilon,1/3)$ Uniformity-Tester has sample complexity $\Omega(\sqrt[4]{n})$ , when the domain is $\{0,1\}^{n}$ and the given distributions are product distributions.

The main idea of the proof is to use a standard technique from property testing where the following lemma is used. The following lemma has been rewritten in the language and context of this paper. A proof of the general statement of the lemma can be found in [Fis04, FNS04].

Theorem A.2.

Let $P$ be a property of distributions over $\sigma^{n}$ that we want to test. Suppose $\mathcal{D}_{Y}$ is a distribution over all the distributions that satisfy the given property $P$ , and let $\mathcal{D}_{N}$ be a distribution over all distributions that are $\epsilon$ -far from satisfying the property $P$ . Let $Q_{Y}$ be the distribution over outcomes of $q$ samples when the samples are drawn from a distribution $D_{Y}$ that is drawn according to $\mathcal{D}_{Y}$ . Similarly, let $Q_{N}$ be the distribution over outcomes of $q$ samples when the samples are drawn from a distribution $D_{N}$ , that is drawn according to the $\mathcal{D}_{N}$ . If the variation distance between $Q_{Y}$ and $Q_{N}$ is less than $1/3$ , then any $(\epsilon,1/3)$ -Tester for the property $P$ will have sample complexity more than $q$ .

In the context of our theorem, the property $P$ is “Uniformity”. So the distribution $\mathcal{D}_{Y}$ is the uniform distribution over the domain $\{0,1\}$ . Now let us define the distribution $\mathcal{D}_{N}$ :

Let $D_{1}$ be the distribution over $\{0,1\}$ where $1$ is produced with probability $(1/2+2\sqrt{\frac{\epsilon}{n}})$ and [math] produced with probability $(1/2-2\sqrt{\frac{\epsilon}{n}})$ . And let $D_{0}$ be the distribution over $\{0,1\}$ where $1$ is produced with probability $(1/2-2\sqrt{\frac{\epsilon}{n}})$ and [math] produced with probability $(1/2+2\sqrt{\frac{\epsilon}{n}})$ .

Consider the set of distributions $\mathcal{D}$ over $\{0,1\}^{n}$ which are a product of $n$ distribution each of which is either $D_{0}$ or $D_{1}$ . That is,

[TABLE]

Claim A.3.

Any $\mu\in\mathcal{D}$ is $\epsilon$ -far from uniform. That is, for any $\mu\in\mathcal{D}$ we have

[TABLE]

From Claim A.3 we see that all the distributions in $\mathcal{D}$ are $\epsilon$ -far from uniform. Thus we can take the distribution $\mathcal{D}$ as our distribution $\mathcal{D}_{N}$ . If a distribution is drawn from $\mathcal{D}_{N}$ or $\mathcal{D}_{Y}$ , $q$ samples from the distribution will give $q$ many $\{0,1\}$ -strings of length $n$ . Note that if a distribution is drawn from $\mathcal{D}_{Y}$ (that is, the distribution is the uniform distribution over $\{0,1\}^{n}$ ), then the distribution of the outcomes of $q$ samples is a uniform distribution over $\{0,1\}^{nq}$ . So, by theorem A.2, it is enough to show that if $\mu$ is drawn from $\mathcal{D}_{N}$ then the distribution of the outcomes (as a distribution over $\{0,1\}^{nq}$ ) is $1/3$ -close to uniform.

Note that $\mu$ is a distribution drawn from $\mathcal{D}_{N}$ we can think of $\mu$ as $\mu_{1}\otimes\dots\otimes\mu_{n}$ where each $\mu_{i}$ is independently and uniformly chosen from the set $\{D_{0},D_{1}\}$ . Let $\mu^{q}$ be the distribution over $\{0,1\}^{nq}$ when $q$ samples are drawn from $\mu$ . And now the following lemma completes the proof of Theorem A.1.

Lemma A.4.

If $q\leq\frac{\sqrt[4]{n}}{20\sqrt{\epsilon}}$ then

[TABLE]

∎

A.1 Proof of Claim A.3

Let $\mu=\mu_{1}\otimes\dots\otimes\mu_{n}$ . Without loss of generality, we will assume that all the $\mu_{i}$ ’s are the distribution $D_{1}$ . That is $1$ is produced with probability $(1/2+2\sqrt{\frac{\epsilon}{n}})$ and [math] produced with probability $(1/2-2\sqrt{\frac{\epsilon}{n}})$ . For simplifying notations, we will assume $1$ is produced with probability $(1/2+\epsilon^{\prime})$ and [math] produced with probability $(1/2-\epsilon^{\prime})$ .

Since we know $d(\mu,\mathcal{U})\geq H(\mu,\mathcal{U})^{2}$ , it is enough for us to prove $H(\mu,\mathcal{U})^{2}\geq\epsilon$ . For any $x\in\{0,1\}^{n}$ let $p(x)$ be the probability of getting $x$ when drawn from $\mu$ . Note that the probability of getting $x$ when drawn from $\mathcal{U}$ is $1/2^{n}$ .

By definition we have

[TABLE]

Now note that if $x$ has $k$ 1’s and $(n-k)$ 0’s then $p(x)=(1/2+\epsilon^{\prime})^{k}(1/2-\epsilon^{\prime})^{n-k}$ . So we have

[TABLE]

Now since $(\sqrt{1+x}+\sqrt{1-x})\leq 2(1-\frac{x^{2}}{8})$ for all $x\leq 1$ so,

[TABLE]

The last inequality follows from the fact that $(1-x)^{n}\leq(1-xn+\binom{n}{2}x^{2})$ . Now putting all the things together, we have

[TABLE]

If $\epsilon^{\prime}=2\sqrt{\epsilon/n}$ then from the above inequality, and the fact that $\epsilon<1/2$ , we have $H(\mu,U)^{2}\geq\epsilon$ .

A.2 Proof of Lemma A.4

Let us start with a claim. We defer the proof of the claim to the end of this section.

Claim A.5.

If $P$ and $Q$ be two distributions over $\Sigma$ and for all $x\in\Sigma$ we have

[TABLE]

then we have

[TABLE]

Claim A.5 helps to upper bound the Hellinger distance in terms of the $\ell_{\infty}$ distance. Now let $\Sigma=\{0,1\}^{q}$ . And let $\mu_{i}^{q}$ be the distribution on $\Sigma$ that is obtained by drawing $q$ samples from $\mu_{i}$ . Clearly, $\mu^{q}=\mu_{1}^{q}\otimes\mu_{2}^{q}\otimes\dots\otimes\mu_{n}^{q}$ . To prove that the variation distance of $\mu^{q}$ from uniform is less than $1/3$ , we will first show that the $\ell_{\infty}$ distance of $\mu_{i}$ from uniform is small, then using Claim A.5 we get that the Hellinger distance of $\mu_{i}^{q}$ from uniform is small. And then, we can show that if all the $\mu_{i}^{q}$ has a small Hellinger distance from uniform, then $\mu^{q}$ has a small Hellinger distance from uniform, which would give an upper bound on the variation distance of $\mu^{q}$ from uniform.

Now the following claim upper bounds the $\ell_{\infty}$ distance of $\mu_{i}^{q}$ from uniform.

Claim A.6.

For all $i$ and for all $x\in\Sigma$

[TABLE]

Or, in other words, for all $x\in\Sigma$ if

[TABLE]

then $|\epsilon_{x}|\leq 10\epsilon q^{2}/n$

By definition of Hellinger distance and variation distance, we have

[TABLE]

Again we know that for any two product distributions $P=P_{1}\otimes\dots\otimes P_{n}$ and $Q=Q_{1}\otimes\dots\otimes Q_{n}$

[TABLE]

Thus we have

[TABLE]

From Equation 3 and Claim A.5 we have

[TABLE]

where, $q(x)=\Pr_{\mathcal{U}}(x)$ . So $q(x)=2^{q}$ . From Claim A.6 we have that $\epsilon_{x}=10\epsilon q^{2}/n$ . So we have

[TABLE]

Thus if $q\leq\sqrt[4]{n}/20\sqrt{\epsilon}$ we have $d(\mu^{q},\mathcal{U})\leq 2\sqrt{1/40}$ which is less than $1/3$

A.2.1 Proof of Claim A.5

Let $p(x)=\Pr_{P}(x)$ and $q(x)=\Pr_{Q}(x)$ . By definition

[TABLE]

Now $\sqrt{p(x)q(x)}=q(x)\sqrt{1+\epsilon_{x}}$ . Now it is easy to verify that for all $x$ such that $|x|\leq 1$ , we have

[TABLE]

So, from the above observation,

[TABLE]

Now since $\sum_{x}q(x)=1$ and $\sum_{x}q_{x}\epsilon_{x}=0$ so we have

[TABLE]

A.2.2 Proof of Claim A.6

Let $x\in\Sigma$ has $k$ $1$ ’s and $(q-k)$ [math]’s. Since the $\mu_{i}$ is either the distribution $D_{1}$ with probability $1/2$ or distribution $D_{2}$ with probability $1/2$ , so the probability of $x$ appearing, when drawn from $\mu_{i}^{q}$ , is

[TABLE]

Using the inequality $(1+x)^{r}\geq 1+xr$ (holds for $x\geq-1$ and $r\in\mathbb{N}$ ), we have

[TABLE]

The right-hand side of the above inequality is equal to $(2-8k(q-k)\epsilon^{\prime 2})$ . Thus we have

[TABLE]

For the upper bound, we shall use the following inequality. Let $r\in\mathbb{N},x\geq-1$ be such that $xr<1$ . It holds that

[TABLE]

The above inequality can be easily proved using the following facts.

**When $r\in\mathbb{N},x>0$ and $rx<1$ **

(a)

it holds that $(1+x)^{r}\leq\mathrm{e}^{rx}$ . 2. (b)

as $0\leq rx<1$ it holds that $\mathrm{e}^{rx}\leq 1+xr+x^{2}r^{2}$ . 2. 2.

When $r\in\mathbb{N},-1\leq x\leq 0$ it holds that $(1+x)^{r}\leq 1+xr+x^{2}r^{2}$ (can be proved using induction on $r$ ).

Since $\epsilon^{\prime}=2\sqrt{\epsilon/n}$ and $q\leq\sqrt[4]{n}$ , $\epsilon^{\prime}q<1$ . Hence, for all $k\leq q$ ,

[TABLE]

and thus

[TABLE]

Since $\epsilon^{\prime}=2\sqrt{\epsilon/n}$ and $q\leq\sqrt[4]{n}$ so we have

[TABLE]

And thus, we have

[TABLE]

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[ACK 15] Jayadev Acharya, Clément L. Canonne, and Gautam Kamath. A chasm between identity and equivalence testing with conditional queries. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2015, August 24-26, 2015, Princeton, NJ, USA , pages 449–466, 2015.
2[ADK 15] Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. Optimal testing for properties of distributions. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada , pages 3591–3599, 2015.
3[BDKR 05] Tuǧkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. The complexity of approximating the entropy. SIAM J. Comput. , 35(1):132–150, 2005.
4[BFF + 01a] Tuǧkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Bob Werner, editor, Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (FOCS-01) , pages 442–451, Los Alamitos, CA, October 14–17 2001.
5[BFF + 01b] Tuǧkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, , pages 442–451, 2001.
6[BFR + 13] Tuǧkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing closeness of discrete distributions. Journal of the ACM , 60(1):4:1–4:25, February 2013.
7[Can 15a] Clément L. Canonne. Big data on the rise? - testing monotonicity of distributions. In Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part I , pages 294–305, 2015.
8[Can 15b] Clément L. Canonne. A survey on distribution testing: Your data is big. but is it blue? Electronic Colloquium on Computational Complexity (ECCC) , 22:63, 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Property Testing of Joint Distributions using Conditional Samples

Abstract

1 Introduction

Conditional Sampling

Testing Joint Distributions: Subcube Conditioning

1.1 Our Results

Motivation of SubCube conditioning

Our Results

Theorem 1.1**.**

Theorem 1.2**.**

Theorem 1.3**.**

Theorem 1.4**.**

Comparison to Previous Results

Overview of Our Technique

1.2 Organization of the paper

2 Notations and Preliminaries

Definition 2.1**.**

The SubCube Condition Model

3 Chain Rule of Conditional Distances

Lemma 3.1** (Chain Rule of Conditional Distances).**

Proof of Lemma 3.1:.

Lemma 3.2**.**

Proof of Lemma 3.2.

Claim 3.3**.**

Proof of Claim 3.3.

4 Testing Identity with a known distribution

Lemma 4.1**.**

Theorem 4.2**.**

Note 4.3**.**

4.1 Proof of Theorem 4.2

4.1.1 Sample Complexity of Algorithm 1

4.1.2 Correctness of the Algorithm 1

Lemma 4.4**.**

( Proof of Lemma 4.4.).

4.2 Uniformity Tester for Arbitrary Joint Distribution

Theorem 4.5**.**

5 Identity Testing between Unknown Joint Distributions

Lemma 5.1**.**

Sample Complexity of Algorithm 2

Theorem 5.2**.**

6 Testing Independence of Marginals

Theorem 6.1**.**

7 Conclusion

Acknowledgements

Appendix A A Weaker Lower Bound with Simple Proof

Theorem A.1**.**

Proof.

Theorem A.2**.**

Claim A.3**.**

Lemma A.4**.**

A.1 Proof of Claim A.3

A.2 Proof of Lemma A.4

Claim A.5**.**

Claim A.6**.**

A.2.1 Proof of Claim A.5

A.2.2 Proof of Claim A.6

Theorem 1.1.

Theorem 1.2.

Theorem 1.3.

Theorem 1.4.

Definition 2.1.

Lemma 3.1 (Chain Rule of Conditional Distances).

Lemma 3.2.

Claim 3.3.

Lemma 4.1.

Theorem 4.2.

Note 4.3.

Lemma 4.4.

Theorem 4.5.

Lemma 5.1.

Theorem 5.2.

Theorem 6.1.

Theorem A.1.

Theorem A.2.

Claim A.3.

Lemma A.4.

Claim A.5.

Claim A.6.