Linear Regression with Distributed Learning: A Generalization Error Perspective
Martin Hellkvist, Ay\c{c}a \"Oz\c{c}elikkale, Anders Ahl\'en

TL;DR
This paper analyzes the generalization error of distributed linear regression, revealing how model partitioning affects performance on unseen data compared to centralized solutions.
Contribution
It provides high-probability bounds on generalization error for distributed linear regression with various data types, highlighting the impact of model partitioning.
Findings
Distributed solutions can have higher generalization error than centralized ones.
Generalization bounds depend on data distribution and model partitioning.
Numerical experiments validate theoretical insights with real and synthetic data.
Abstract
Distributed learning provides an attractive framework for scaling the learning task by sharing the computational load over multiple nodes in a network. Here, we investigate the performance of distributed learning for large-scale linear regression where the model parameters, i.e., the unknowns, are distributed over the network. We adopt a statistical learning approach. In contrast to works that focus on the performance on the training data, we focus on the generalization error, i.e., the performance on unseen data. We provide high-probability bounds on the generalization error for both isotropic and correlated Gaussian data as well as sub-gaussian data. These results reveal the dependence of the generalization performance on the partitioning of the model over the network. In particular, our results show that the generalization error of the distributed solution can be substantially higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression
