Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness
Qi He, Peiran Yu, Ziyi Chen, Heng Huang

TL;DR
This paper extends the theoretical understanding of shuffling-type gradient methods by establishing convergence rates without relying on Lipschitz smoothness, thus broadening their applicability in machine learning.
Contribution
It introduces a new stepsize strategy that guarantees convergence of shuffling-type gradient methods under weaker assumptions than Lipschitz smoothness, matching the best known rates.
Findings
Convergence proven for nonconvex, strongly convex, and non-strongly convex cases.
Validates the approach with numerical experiments showing practical effectiveness.
Applicable to both random and arbitrary shuffling schemes.
Abstract
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance. Despite extensive development of convergence guarantees under various assumptions in recent years, most require the Lipschitz smoothness condition, which is often not met in common machine learning models. We highlight this issue with specific counterexamples. To address this gap, we revisit the convergence rates of shuffling-type gradient methods without assuming Lipschitz smoothness. Using our stepsize strategy, the shuffling-type gradient algorithm not only converges under weaker assumptions but also match the current best-known convergence rates, thereby broadening its applicability. We prove the convergence rates for nonconvex, strongly convex, and non-strongly convex cases, each under both random reshuffling and arbitrary shuffling schemes, under a general bounded variance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
