Improved Information Theoretic Generalization Bounds for Distributed and Federated Learning
L. P. Barnes, Alex Dytso, and H. V. Poor

TL;DR
This paper derives improved information-theoretic bounds on the expected generalization error in distributed and federated learning, showing a better dependence on the number of nodes and considering various algorithms and loss functions.
Contribution
It introduces tighter bounds on generalization error for distributed learning, accounting for multiple algorithms and loss functions, with a focus on mutual information and node-specific analysis.
Findings
Bounds show a 1/K dependence on the number of nodes.
Applicable to models with Bregman divergence and Lipschitz losses.
Bounds incorporate communication and privacy constraints.
Abstract
We consider information-theoretic bounds on expected generalization error for statistical learning problems in a networked setting. In this setting, there are nodes, each with its own independent dataset, and the models from each node have to be aggregated into a final centralized model. We consider both simple averaging of the models as well as more complicated multi-round algorithms. We give upper bounds on the expected generalization error for a variety of problems, such as those with Bregman divergence or Lipschitz continuous losses, that demonstrate an improved dependence of on the number of nodes. These "per node" bounds are in terms of the mutual information between the training dataset and the trained weights at each node, and are therefore useful in describing the generalization properties inherent to having communication or privacy constraints at each node.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
