Adaptive pruning-based Newton's method for distributed learning
Shuzhen Chen, Yuan Yuan, Youming Tao, Tianzhu Wang, Zhipeng Cai and, Dongxiao Yu

TL;DR
This paper introduces ANL novel distributed Newton's method that adapts to resources and data heterogeneity, achieving fast convergence with reduced communication costs in large-scale distributed learning.
Contribution
The paper proposes ANL new adaptive Newton-based algorithm for distributed learning that overcomes computational and communication challenges of traditional Newton's methods.
Findings
Achieves linear convergence rate.
Efficient communication and resource adaptation.
Strong performance across datasets.
Abstract
Newton's method leverages curvature information to boost performance, and thus outperforms first-order methods for distributed learning problems. However, Newton's method is not practical in large-scale and heterogeneous learning environments, due to obstacles such as high computation and communication costs of the Hessian matrix, sub-model diversity, staleness of training, and data heterogeneity. To overcome these obstacles, this paper presents a novel and efficient algorithm named Distributed Adaptive Newton Learning (\texttt{DANL}), which solves the drawbacks of Newton's method by using a simple Hessian initialization and adaptive allocation of training regions. The algorithm exhibits remarkable convergence properties, which are rigorously examined under standard assumptions in stochastic optimization. The theoretical analysis proves that \texttt{DANL} attains a linear convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques
