Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering
Francois Chaubard, Duncan Eddy, Mykel J. Kochenderfer

TL;DR
Gradient Agreement Filtering (GAF) enhances distributed deep learning by filtering conflicting gradients based on cosine similarity, leading to better generalization, higher accuracy, and reduced computation, especially with smaller microbatches.
Contribution
This paper introduces Gradient Agreement Filtering, a novel method that filters conflicting microgradients to improve robustness and efficiency in distributed training.
Findings
Outperforms traditional gradient averaging in accuracy.
Enables smaller microbatch sizes without training instability.
Reduces training computation by nearly tenfold.
Abstract
We introduce Gradient Agreement Filtering (GAF) to improve on gradient averaging in distributed deep learning optimization. Traditional distributed data-parallel stochastic gradient descent involves averaging gradients of microbatches to calculate a macrobatch gradient that is then used to update model parameters. We find that gradients across microbatches are often orthogonal or negatively correlated, especially in late stages of training, which leads to memorization of the training set, reducing generalization. In this paper, we introduce a simple, computationally effective way to reduce gradient variance by computing the cosine distance between micro-gradients during training and filtering out conflicting updates prior to averaging. We improve validation accuracy with significantly smaller microbatch sizes. We also show this reduces memorizing noisy labels. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Optimization and Search Problems · Advanced Bandit Algorithms Research
