Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Zhen Qin, Zhishuai Liu, Pan Xu

TL;DR
This paper analyzes the convergence of signSGD with random reshuffling in nonconvex optimization, introduces improved algorithms with faster convergence, and validates findings through experiments, addressing a gap in theoretical understanding of practical implementations.
Contribution
The paper provides the first convergence analysis of signSGD with random reshuffling, proposes two new sign-based algorithms with enhanced rates, and extends these methods to distributed settings.
Findings
Expected gradient norm bounded by O(log(nT)/√(nT) + σ)
New algorithms achieve faster convergence rates of O(log(nT)/√(nT) + log(nT)√n/√T)
Experimental results confirm the effectiveness of the proposed methods
Abstract
signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses typically assume data are sampled with replacement in each iteration, contradicting a common practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. This gap leaves the theoretical understanding of the more practical algorithm, signSGD with random reshuffling (SignRR), largely unexplored. We develop the first analysis of SignRR to identify the core technical challenge that prevents a thorough convergence analysis of this method. In particular, given a dataset of size and epochs, we show that the expected gradient norm of SignRR is upper bounded by , where is the averaged conditional mean square error that may not vanish. To tackle this limitation, we develop two new sign-based algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
