Communication and Memory Efficient Testing of Discrete Distributions
Ilias Diakonikolas, Themis Gouleakis, Daniel M. Kane and, Sankeerth Rao

TL;DR
This paper develops efficient algorithms for testing whether discrete distributions are uniform or identical, under strict memory and communication constraints in streaming and distributed models, with matching lower bounds.
Contribution
It introduces new algorithms for distribution testing in constrained models and establishes nearly-tight lower bounds on sample complexity and communication costs.
Findings
Efficient algorithms for uniformity and identity testing in streaming and distributed models.
Nearly-tight lower bounds on sample complexity under memory constraints.
Nearly-tight lower bounds on communication costs in distributed uniformity testing.
Abstract
We study distribution testing with communication and memory constraints in the following computational models: (1) The {\em one-pass streaming model} where the goal is to minimize the sample complexity of the protocol subject to a memory constraint, and (2) A {\em distributed model} where the data samples reside at multiple machines and the goal is to minimize the communication cost of the protocol. In both these models, we provide efficient algorithms for uniformity/identity testing (goodness of fit) and closeness testing (two sample testing). Moreover, we show nearly-tight lower bounds on (1) the sample complexity of any one-pass streaming tester for uniformity, subject to the memory constraint, and (2) the communication cost of any uniformity testing protocol, in a restricted `one-pass' model of communication.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Privacy-Preserving Technologies in Data
