Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization
Hamidreza Almasi, Harsh Mishra, Balajee Vamanan, Sathya N. Ravi

TL;DR
This paper introduces a convex optimization-based method for robust distributed training that effectively handles Byzantine failures and data augmentation, improving accuracy and communication efficiency.
Contribution
It formulates aggregation as a maximum likelihood estimation problem and provides a scalable, provably convergent solution that enhances robustness in distributed deep learning.
Findings
Significantly improves robustness of Byzantine resilient aggregators
Enhances communication efficiency in distributed training
Achieves better accuracy across various tasks
Abstract
Modern ML applications increasingly rely on complex deep learning models and large datasets. There has been an exponential growth in the amount of computation needed to train the largest models. Therefore, to scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model. However, a distributed setup is prone to Byzantine failures of individual nodes, components, and software. With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems. We define the quality of workers as reconstruction ratios , and formulate aggregation as a Maximum Likelihood Estimation procedure using Beta densities. We show that the Regularized form of log-likelihood wrt subspace can be approximately solved using iterative least squares…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
