Robust Collaborative Learning with Linear Gradient Overhead
Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, L\^e Nguy\^en, Hoang, Rafael Pinot, John Stephan

TL;DR
This paper introduces MoNNA, a robust collaborative learning algorithm that tolerates faulty machines with provable guarantees and linear gradient overhead, using momentum and nearest-neighbor averaging.
Contribution
MoNNA is a novel algorithm combining momentum and NNA, with a new analysis framework, offering robustness under standard assumptions and linear overhead.
Findings
MoNNA is robust against faulty machines in distributed learning.
The gradient computation overhead of MoNNA is linear in the fraction of faulty machines.
Experimental validation on image classification tasks supports the theoretical claims.
Abstract
Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak's momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Stream Mining Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsStochastic Gradient Descent
