Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization

Lam M. Nguyen; Dzung T. Phan; Jayant Kalagnanam

arXiv:2604.00260·cs.LG·April 2, 2026

Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization

Lam M. Nguyen, Dzung T. Phan, Jayant Kalagnanam

PDF

TL;DR

This paper introduces a novel data shuffling scheme for stochastic gradient descent, combining block reshuffling and paired reversal, guided by a large language model, to improve convergence and stability.

Contribution

It designs a new shuffling algorithm using LLM-guided program evolution, analyzing its structural components for provable improvements over existing schemes.

Findings

01

Block reshuffling reduces prefix-gradient variance constants.

02

Paired reversal cancels leading second-order terms, reducing order sensitivity.

03

Numerical experiments show consistent gains over standard shuffling schemes.

Abstract

Shuffling strategies for stochastic gradient descent (SGD), including incremental gradient, shuffle-once, and random reshuffling, are supported by rigorous convergence analyses for arbitrary within-epoch permutations. In particular, random reshuffling is known to improve optimization constants relative to cyclic and shuffle-once schemes. However, existing theory offers limited guidance on how to design new data-ordering schemes that further improve optimization constants or stability beyond random reshuffling. In this paper, we design a pipeline using a large language model (LLM)-guided program evolution framework to discover an effective shuffling rule for without-replacement SGD. Abstracting from this instance, we identify two fundamental structural components: block reshuffling and paired reversal. We analyze these components separately and show that block reshuffling strictly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.