Any-step Dynamics Model Improves Future Predictions for Online and   Offline Reinforcement Learning

Haoxin Lin; Yu-Yan Xu; Yihao Sun; Zhilong Zhang; Yi-Chen Li; Chengxing; Jia; Junyin Ye; Jiaji Zhang; Yang Yu

arXiv:2405.17031·cs.LG·May 28, 2024·1 cites

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning

Haoxin Lin, Yu-Yan Xu, Yihao Sun, Zhilong Zhang, Yi-Chen Li, Chengxing, Jia, Junyin Ye, Jiaji Zhang, Yang Yu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the Any-step Dynamics Model (ADM) that reduces prediction errors in reinforcement learning by enabling direct multi-step predictions, improving both online and offline data efficiency.

Contribution

The paper proposes ADM, a novel dynamics model that mitigates error accumulation by allowing variable-length plan inputs for direct future state prediction, applicable in online and offline RL.

Findings

01

ADM improves sample efficiency in online RL.

02

ADM outperforms recent offline RL methods.

03

ADM provides better uncertainty quantification.

Abstract

Model-based methods in reinforcement learning offer a promising approach to enhance data efficiency by facilitating policy exploration within a dynamics model. However, accurately predicting sequential steps in the dynamics model remains a challenge due to the bootstrapping prediction, which attributes the next state to the prediction of the current state. This leads to accumulated errors during model roll-out. In this paper, we propose the Any-step Dynamics Model (ADM) to mitigate the compounding error by reducing bootstrapping prediction to direct prediction. ADM allows for the use of variable-length plans as inputs for predicting future states without frequent bootstrapping. We design two algorithms, ADMPO-ON and ADMPO-OFF, which apply ADM in online and offline model-based frameworks, respectively. In the online setting, ADMPO-ON demonstrates improved sample efficiency compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HxLyn3/ADMPO
pytorchOfficial

Videos

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics