Adaptive pruning-based Newton's method for distributed learning

Shuzhen Chen; Yuan Yuan; Youming Tao; Tianzhu Wang; Zhipeng Cai and; Dongxiao Yu

arXiv:2308.10154·cs.LG·December 18, 2024

Adaptive pruning-based Newton's method for distributed learning

Shuzhen Chen, Yuan Yuan, Youming Tao, Tianzhu Wang, Zhipeng Cai and, Dongxiao Yu

PDF

Open Access

TL;DR

This paper introduces ANL novel distributed Newton's method that adapts to resources and data heterogeneity, achieving fast convergence with reduced communication costs in large-scale distributed learning.

Contribution

The paper proposes ANL new adaptive Newton-based algorithm for distributed learning that overcomes computational and communication challenges of traditional Newton's methods.

Findings

01

Achieves linear convergence rate.

02

Efficient communication and resource adaptation.

03

Strong performance across datasets.

Abstract

Newton's method leverages curvature information to boost performance, and thus outperforms first-order methods for distributed learning problems. However, Newton's method is not practical in large-scale and heterogeneous learning environments, due to obstacles such as high computation and communication costs of the Hessian matrix, sub-model diversity, staleness of training, and data heterogeneity. To overcome these obstacles, this paper presents a novel and efficient algorithm named Distributed Adaptive Newton Learning (\texttt{DANL}), which solves the drawbacks of Newton's method by using a simple Hessian initialization and adaptive allocation of training regions. The algorithm exhibits remarkable convergence properties, which are rigorously examined under standard assumptions in stochastic optimization. The theoretical analysis proves that \texttt{DANL} attains a linear convergence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques