Escaping Saddle Points in Distributed Newton's Method with Communication   Efficiency and Byzantine Resilience

Avishek Ghosh; Raj Kumar Maity; Arya Mazumdar; Kannan Ramchandran

arXiv:2103.09424·cs.DC·December 30, 2021

Escaping Saddle Points in Distributed Newton's Method with Communication Efficiency and Byzantine Resilience

Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran

PDF

Open Access

TL;DR

This paper extends a second-order cubic-regularized Newton method to distributed learning, addressing saddle-point avoidance, communication efficiency, and Byzantine resilience, with theoretical guarantees and experimental validation showing significant improvements over first-order methods.

Contribution

It introduces a distributed cubic-regularized Newton method that handles Byzantine attacks and communication bottlenecks, providing theoretical guarantees and empirical validation.

Findings

01

Achieves 25% faster iteration convergence than first-order methods.

02

Provides theoretical guarantees under various settings including Byzantine attacks.

03

Demonstrates effectiveness through experiments on standard datasets.

Abstract

The problem of saddle-point avoidance for non-convex optimization is quite challenging in large scale distributed learning frameworks, such as Federated Learning, especially in the presence of Byzantine workers. The celebrated cubic-regularized Newton method of \cite{nest} is one of the most elegant ways to avoid saddle-points in the standard centralized (non-distributed) setup. In this paper, we extend the cubic-regularized Newton method to a distributed framework and simultaneously address several practical challenges like communication bottleneck and Byzantine attacks. Note that the issue of saddle-point avoidance becomes more crucial in the presence of Byzantine machines since rogue machines may create \emph{fake local minima} near the saddle-points of the loss function, also known as the saddle-point attack. Being a second order algorithm, our iteration complexity is much lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data