A Robust Classification Framework for Byzantine-Resilient Stochastic Gradient Descent
Shashank Reddy Chirra, Kalyan Varma Nadimpalli, Shrisha Rao

TL;DR
This paper introduces a robust gradient classification framework that effectively detects Byzantine faults in distributed stochastic gradient descent, maintaining performance even with many malicious workers and without needing to estimate their number.
Contribution
The proposed RGCF uses a pattern recognition filter trained on gradient directions, achieving Byzantine fault tolerance in both convex and non-convex optimization without prior fault estimation.
Findings
Robust to an arbitrary number of Byzantine workers.
Scales efficiently with large numbers of workers.
Validated on CNN training with MNIST dataset.
Abstract
This paper proposes a Robust Gradient Classification Framework (RGCF) for Byzantine fault tolerance in distributed stochastic gradient descent. The framework consists of a pattern recognition filter which we train to be able to classify individual gradients as Byzantine by using their direction alone. This filter is robust to an arbitrary number of Byzantine workers for convex as well as non-convex optimisation settings, which is a significant improvement on the prior work that is robust to Byzantine faults only when up to 50% of the workers are Byzantine. This solution does not require an estimate of the number of Byzantine workers; its running time is not dependent on the number of workers and can scale up to training instances with a large number of workers without a loss in performance. We validate our solution by training convolutional neural networks on the MNIST dataset in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
