Discovering Learning-Friendly Generation Orders for Sequential Computation

Yuta Sato; Kazuhiko Kawamoto; Hiroshi Kera

arXiv:2506.23875·cs.LG·May 11, 2026

Discovering Learning-Friendly Generation Orders for Sequential Computation

Yuta Sato, Kazuhiko Kawamoto, Hiroshi Kera

PDF

TL;DR

This paper introduces a loss profiling method to automatically discover effective generation orders for sequential computation tasks, significantly improving training success rates across various tasks.

Contribution

It proposes a hierarchical global-local search combined with loss profiling to find learning-friendly orders without task-specific design, handling large candidate spaces efficiently.

Findings

01

Discovered effective orders for tasks with up to 40 elements from random initialization.

02

Achieved near 100% success rate in order discovery, vastly outperforming baseline success rates.

03

Rediscovers known efficient orders, such as reverse-digit order in integer multiplication.

Abstract

Sequential computation via autoregressive generation can make difficult tasks learnable, but the generation order of intermediate states strongly affects whether training succeeds. We address the problem of discovering a learning-friendly target order automatically, rather than relying on task-specific design. Our key observation is that learning-friendly orders cause a faster loss drop in the early stage of training. We exploit this by \emph{loss profiling}, which ranks candidate orders by the early-stage loss of a single short run. To handle the factorial candidate space, we wrap loss profiling in a hierarchical global -- local search over block- and within-block-level orderings. On six order-sensitive tasks, the method discovers effective orders up to $L = 13$ from random initialization and up to $L = 40$ from structured initialization, lifting success rates from about 10\% to near…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.