Distributed Statistical Estimation of Matrix Products with Applications
David P. Woodruff, Qin Zhang

TL;DR
This paper develops efficient distributed algorithms for estimating matrix product statistics such as norms, distinct elements, and heavy hitters, minimizing communication and rounds, with applications in database join problems.
Contribution
It introduces novel distributed estimation methods for matrix product statistics that optimize communication and rounds, connecting to fundamental database join problems.
Findings
Efficient algorithms for $oldsymbol{ ext{ell}_p}$-norm estimation in distributed settings.
Communication complexity bounds for set-intersection and heavy hitter problems.
Applications to database join size estimation and sampling.
Abstract
We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix and Bob holds a matrix , and they want to estimate statistics of . We focus on the well-studied -norm, distinct elements (), -sampling, and heavy hitter problems. The goal is to minimize both the communication cost and the number of rounds of communication. This problem is closely related to the fundamental set-intersection join problem in databases: when the problem corresponds to the size of the set-intersection join. When the output is simply the pair of sets with the maximum intersection size. When the problem corresponds to the size of the corresponding natural join. We also consider the heavy hitters problem which corresponds to finding the pairs of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Distributed Statistical Estimation of Matrix Products with Applications· youtube
Taxonomy
TopicsComplexity and Algorithms in Graphs · Random Matrices and Applications · Markov Chains and Monte Carlo Methods
