Escaping Saddle Points in Distributed Newton's Method with Communication Efficiency and Byzantine Resilience
Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran

TL;DR
This paper extends a second-order cubic-regularized Newton method to distributed learning, addressing saddle-point avoidance, communication efficiency, and Byzantine resilience, with theoretical guarantees and experimental validation showing significant improvements over first-order methods.
Contribution
It introduces a distributed cubic-regularized Newton method that handles Byzantine attacks and communication bottlenecks, providing theoretical guarantees and empirical validation.
Findings
Achieves 25% faster iteration convergence than first-order methods.
Provides theoretical guarantees under various settings including Byzantine attacks.
Demonstrates effectiveness through experiments on standard datasets.
Abstract
The problem of saddle-point avoidance for non-convex optimization is quite challenging in large scale distributed learning frameworks, such as Federated Learning, especially in the presence of Byzantine workers. The celebrated cubic-regularized Newton method of \cite{nest} is one of the most elegant ways to avoid saddle-points in the standard centralized (non-distributed) setup. In this paper, we extend the cubic-regularized Newton method to a distributed framework and simultaneously address several practical challenges like communication bottleneck and Byzantine attacks. Note that the issue of saddle-point avoidance becomes more crucial in the presence of Byzantine machines since rogue machines may create \emph{fake local minima} near the saddle-points of the loss function, also known as the saddle-point attack. Being a second order algorithm, our iteration complexity is much lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
