Are Expressive Models Truly Necessary for Offline RL?
Guan Wang, Haoyi Niu, Jianxiong Li, Li Jiang, Jianming Hu, Xianyuan, Zhan

TL;DR
This paper demonstrates that lightweight, simple models combined with recursive planning can outperform complex models in offline RL, achieving state-of-the-art results efficiently on long-horizon tasks.
Contribution
The authors introduce Recursive Skip-Step Planning (RSP), a method that uses simple models with recursive planning to match or surpass the performance of larger models in offline RL.
Findings
Lightweight models can achieve accurate dynamics with recursive planning.
RSP outperforms existing methods on D4RL benchmarks.
Significant efficiency gains with minimal model complexity.
Abstract
Among various branches of offline reinforcement learning (RL) methods, goal-conditioned supervised learning (GCSL) has gained increasing popularity as it formulates the offline RL problem as a sequential modeling task, therefore bypassing the notoriously difficult credit assignment challenge of value learning in conventional RL paradigm. Sequential modeling, however, requires capturing accurate dynamics across long horizons in trajectory data to ensure reasonable policy performance. To meet this requirement, leveraging large, expressive models has become a popular choice in recent literature, which, however, comes at the cost of significantly increased computation and inference latency. Contradictory yet promising, we reveal that lightweight models as simple as shallow 2-layer MLPs, can also enjoy accurate dynamics consistency and significantly reduced sequential modeling errors against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation
