MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
Chenxing Lin, Xinhui Gao, Haipeng Zhang, Xinran Li, Haitao Wang, Songzhu Mei, Chenglu Wen, Weiquan Liu, Siqi Shen, Cheng Wang

TL;DR
MAGE introduces a multi-scale autoregressive generative approach for offline reinforcement learning, effectively modeling long-horizon trajectories with sparse rewards by capturing multi-resolution temporal dependencies and enabling precise control.
Contribution
It presents a novel hierarchical trajectory modeling framework combining multi-scale autoencoders and transformers with conditional guidance for improved offline RL performance.
Findings
Outperforms 15 baselines on 5 offline RL benchmarks.
Effectively models long-horizon, sparse-reward trajectories.
Enables controllable and coherent trajectory generation.
Abstract
Generative models have gained significant traction in offline reinforcement learning (RL) due to their ability to model complex trajectory distributions. However, existing generation-based approaches still struggle with long-horizon tasks characterized by sparse rewards. Some hierarchical generation methods have been developed to mitigate this issue by decomposing the original problem into shorter-horizon subproblems using one policy and generating detailed actions with another. While effective, these methods often overlook the multi-scale temporal structure inherent in trajectories, resulting in suboptimal performance. To overcome these limitations, we propose MAGE, a Multi-scale Autoregressive GEneration-based offline RL method. MAGE incorporates a condition-guided multi-scale autoencoder to learn hierarchical trajectory representations, along with a multi-scale transformer that…
Peer Reviews
Decision·ICLR 2026 Poster
- The multi-scale autoencoder captures hierarchical temporal dependencies by encoding trajectories into token maps from coarse to fine resolutions, enabling better handling of long-term structures, which enhances novelty by extending autoregressive models like VAR to temporal domains in RL. - Figure 1 effectively contrasts MAGE's hierarchical generation with Decision Transformer's sequential and Decision Diffuser's all-at-once approaches, immediately conveying the core conceptual contribution. -
- The paper credits VAR for visual autoregressive modeling and states in the appendix that MAGE is "implemented based on the source code of... VAR". The core idea of MAGE (multi-scale, coarse-to-fine autoregressive generation) appears to be a direct application of the VAR architecture to RL trajectories. The paper should more clearly delineate its own novel contributions from the base architecture it adapts. - The final MAGE system is highly complex, involving a multi-scale VQ-VAE, a multi-scale
1. This paper proposes a coherent multi-scale discrete trajectory representation coupled with cross-scale AR generation (global dependencies at coarse scales; local refinements at fine scales). 2. This paper proposes a condition-guided decoder with explicit initial-condition loss to reduce quantization/AR mismatch. 3. This work presents strong empirical results on long-horizon, sparse-reward tasks, plus ablations on hierarchy depth and conditioning.
1. No RTG-value generalization study. The method conditions on RTG but does not report performance vs. different target RTGs (e.g., low/medium/high percentiles or out-of-distribution RTGs). 2. The same framework requires different codebook sizes, hierarchy depth K, and network depth across suites, indicating high sensitivity to task distribution and substantial tuning burden. 3. Using inverse dynamics cloning plus conditional reconstruction encourages alignment with the behaviour distribution, w
- Hierarchical generation with autoregressive models is a relevant topic in offline RL. - Strong empirical results in several offline RL benchmarks. - Idea is clear and also it exhibits short inference time, which is a main bottleneck of diffusion-based method for offline RL.
- While authors list up hyperparameters for each task in the appendix, several crucial hyperparameters are missing: RTG conditioning value, $\lambda_{\text{cond}}$, and number of temporal scales (K). - It would be better to evaluate the method with much larger mazes, such as pointmaze-giant and antmaze-giant suggested in OGBench, to clearly verify the effectiveness of the proposed method in long-horizon settings.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Advanced Multi-Objective Optimization Algorithms
