Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction
Wei Jiang, Sifan Yang, Wenhao Yang, Lijun Zhang

TL;DR
This paper introduces a variance-reduced sign stochastic gradient descent method that accelerates convergence rates for high-dimensional optimization and distributed learning tasks, outperforming existing signSGD approaches.
Contribution
The paper proposes the Sign-based Stochastic Variance Reduction (SSVR) method, achieving faster convergence rates for signSGD and extending improvements to distributed heterogeneous settings.
Findings
Improved convergence rate to $oldsymbol{ ext{O}(d^{1/2}T^{-1/3})}$ for signSGD.
Enhanced finite-sum problem convergence to $oldsymbol{ ext{O}(m^{1/4}d^{1/2}T^{-1/2})}$.
Distributed algorithms with convergence rates of $oldsymbol{ ext{O}(d^{1/2}T^{-1/2} + dn^{-1/2})}$ and $oldsymbol{ ext{O}(d^{1/4}T^{-1/4})}$.
Abstract
Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of , where represents the dimension and is the iteration number. In this paper, we improve this convergence rate to by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of , where denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEvolutionary Algorithms and Applications
