TL;DR
This paper analyzes the statistical error of averaging in distributed empirical risk minimization, showing it is asymptotically optimal in fixed dimensions but incurs a linear accuracy loss in high-dimensional regimes, guiding practical machine learning deployment.
Contribution
It provides asymptotically exact error expressions for distributed averaging in both fixed and high-dimensional settings, revealing when averaging is optimal or incurs a loss.
Findings
Averaging is asymptotically as accurate as centralized solutions in fixed dimensions.
In high-dimensional regimes, averaging causes a linear accuracy loss with the number of machines.
The results inform optimal choices for the number of machines in distributed learning.
Abstract
A common approach to statistical learning with big-data is to randomly split it among machines and learn the parameter of interest by averaging the individual estimates. In this paper, focusing on empirical risk minimization, or equivalently M-estimation, we study the statistical error incurred by this strategy. We consider two large-sample settings: First, a classical setting where the number of parameters is fixed, and the number of samples per machine . Second, a high-dimensional regime where both with . For both regimes and under suitable assumptions, we present asymptotically exact expressions for this estimation error. In the fixed- setting, under suitable assumptions, we prove that to leading order averaging is as accurate as the centralized solution. We also derive the second order error terms, and show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
