Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization
Lam M. Nguyen, Dzung T. Phan, Jayant Kalagnanam

TL;DR
This paper introduces a novel data shuffling scheme for stochastic gradient descent, combining block reshuffling and paired reversal, guided by a large language model, to improve convergence and stability.
Contribution
It designs a new shuffling algorithm using LLM-guided program evolution, analyzing its structural components for provable improvements over existing schemes.
Findings
Block reshuffling reduces prefix-gradient variance constants.
Paired reversal cancels leading second-order terms, reducing order sensitivity.
Numerical experiments show consistent gains over standard shuffling schemes.
Abstract
Shuffling strategies for stochastic gradient descent (SGD), including incremental gradient, shuffle-once, and random reshuffling, are supported by rigorous convergence analyses for arbitrary within-epoch permutations. In particular, random reshuffling is known to improve optimization constants relative to cyclic and shuffle-once schemes. However, existing theory offers limited guidance on how to design new data-ordering schemes that further improve optimization constants or stability beyond random reshuffling. In this paper, we design a pipeline using a large language model (LLM)-guided program evolution framework to discover an effective shuffling rule for without-replacement SGD. Abstracting from this instance, we identify two fundamental structural components: block reshuffling and paired reversal. We analyze these components separately and show that block reshuffling strictly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
