Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
Haoxin Lin, Yu-Yan Xu, Yihao Sun, Zhilong Zhang, Yi-Chen Li, Chengxing, Jia, Junyin Ye, Jiaji Zhang, Yang Yu

TL;DR
This paper introduces the Any-step Dynamics Model (ADM) that reduces prediction errors in reinforcement learning by enabling direct multi-step predictions, improving both online and offline data efficiency.
Contribution
The paper proposes ADM, a novel dynamics model that mitigates error accumulation by allowing variable-length plan inputs for direct future state prediction, applicable in online and offline RL.
Findings
ADM improves sample efficiency in online RL.
ADM outperforms recent offline RL methods.
ADM provides better uncertainty quantification.
Abstract
Model-based methods in reinforcement learning offer a promising approach to enhance data efficiency by facilitating policy exploration within a dynamics model. However, accurately predicting sequential steps in the dynamics model remains a challenge due to the bootstrapping prediction, which attributes the next state to the prediction of the current state. This leads to accumulated errors during model roll-out. In this paper, we propose the Any-step Dynamics Model (ADM) to mitigate the compounding error by reducing bootstrapping prediction to direct prediction. ADM allows for the use of variable-length plans as inputs for predicting future states without frequent bootstrapping. We design two algorithms, ADMPO-ON and ADMPO-OFF, which apply ADM in online and offline model-based frameworks, respectively. In the online setting, ADMPO-ON demonstrates improved sample efficiency compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
