DOME: Improving Signal-to-Noise in Stochastic Gradient Descent via Sharp-Direction Subspace Filtering
Julien Nicolas, Mohamed Maouche, Sonia Ben Mokhtar, Mark Coates

TL;DR
This paper introduces DOME, a method that improves the signal-to-noise ratio in stochastic gradient descent by filtering out a nuisance subspace identified through gradient covariance, enhancing gradient compression without harming optimization.
Contribution
The paper proposes a first-order, efficient method to identify and remove a nuisance subspace in stochastic gradients, improving signal-to-noise ratio in SGD.
Findings
Removing the identified subspace improves gradient signal-to-noise ratio.
Filtering this subspace benefits gradient compression applications.
The method is effective without impacting optimization performance.
Abstract
Stochastic gradients for deep neural networks exhibit strong correlations along the optimization trajectory, and are often aligned with a small set of Hessian eigenvectors associated with outlier eigenvalues. Recent work shows that projecting gradients away from this Hessian outlier subspace has little impact on optimization, despite capturing a large fraction of gradient variability. Since computing the Hessian is intractable in practice, we introduce a principled first-order characterization of the nuisance subspace based on the covariance of stochastic gradients, and propose an efficient method to estimate it online. We show that removing this subspace also has little impact on optimization, and yields practical benefits for applications sensitive to gradient signal-to-noise ratio such as gradient compression.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Cloud Computing and Resource Management · Energy Efficient Wireless Sensor Networks
