Federated Heavy Hitter Recovery under Linear Sketching
Adria Gascon, Peter Kairouz, Ziteng Sun, Ananda Theertha Suresh

TL;DR
This paper explores the tradeoffs in communication and accuracy for federated heavy hitter detection and histogram estimation under linear sketching constraints, proposing optimal algorithms and analyzing their costs.
Contribution
It introduces efficient algorithms using subsampling and IBLTs for federated heavy hitter and histogram problems, proving their information-theoretic optimality and analyzing communication costs.
Findings
Linear sketching increases communication costs proportionally to the number of users.
Heavy hitter discovery requires less communication overhead than approximate histograms across rounds.
Empirical results validate the theoretical tradeoffs and optimality of the proposed algorithms.
Abstract
Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds is at most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Random Matrices and Applications · Stochastic Gradient Optimization Techniques
