Scaling Offline RL via Efficient and Expressive Shortcut Models
Nicolas Espinosa-Dice, Yiyi Zhang, Yiding Chen, Bradley Guo, Owen Oertell, Gokul Swamy, Kiante Brantley, Wen Sun

TL;DR
This paper introduces SORL, a scalable offline reinforcement learning algorithm that uses novel shortcut models to efficiently train and infer policies capable of modeling complex behaviors, demonstrating strong performance and positive scaling.
Contribution
The paper proposes a new class of generative models called shortcut models for offline RL, enabling simple, efficient training and scalable inference with improved performance.
Findings
SORL achieves strong results across various offline RL tasks.
SORL exhibits positive scaling with increased test-time compute.
The method simplifies training to a one-stage process.
Abstract
Diffusion and flow models have emerged as powerful generative approaches capable of modeling diverse and multimodal behavior. However, applying these models to offline reinforcement learning (RL) remains challenging due to the iterative nature of their noise sampling processes, making policy optimization difficult. In this paper, we introduce Scalable Offline Reinforcement Learning (SORL), a new offline RL algorithm that leverages shortcut models - a novel class of generative models - to scale both training and inference. SORL's policy can capture complex data distributions and can be trained simply and efficiently in a one-stage training procedure. At test time, SORL introduces both sequential and parallel inference scaling by using the learned Q-function as a verifier. We demonstrate that SORL achieves strong performance across a range of offline RL tasks and exhibits positive scaling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
