Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling
Shenghong He

TL;DR
This paper introduces RT, a model-based offline reinforcement learning algorithm that guarantees trajectory reliability by considering historical data, leading to improved policy learning and performance on benchmark tasks.
Contribution
The paper proposes a novel reliability-guaranteed transformer (RT) for MORL that accounts for historical information, providing theoretical guarantees and empirical improvements over existing methods.
Findings
RT effectively filters unreliable trajectories based on cumulative reliability.
RT achieves higher returns compared to state-of-the-art MORL methods.
Theoretical performance guarantees support RT's reliability in policy learning.
Abstract
Model-based offline reinforcement learning (MORL) aims to learn a policy by exploiting a dynamics model derived from an existing dataset. Applying conservative quantification to the dynamics model, most existing works on MORL generate trajectories that approximate the real data distribution to facilitate policy learning by using current information (e.g., the state and action at time step ). However, these works neglect the impact of historical information on environmental dynamics, leading to the generation of unreliable trajectories that may not align with the real data distribution. In this paper, we propose a new MORL algorithm \textbf{R}eliability-guaranteed \textbf{T}ransformer (RT), which can eliminate unreliable trajectories by calculating the cumulative reliability of the generated trajectory (i.e., using a weighted variational distance away from the real data). Moreover, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Elevator Systems and Control
MethodsALIGN
