Model-Based Offline Reinforcement Learning with Reliability-Guaranteed   Sequence Modeling

Shenghong He

arXiv:2502.06491·cs.LG·May 6, 2025

Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling

Shenghong He

PDF

Open Access

TL;DR

This paper introduces RT, a model-based offline reinforcement learning algorithm that guarantees trajectory reliability by considering historical data, leading to improved policy learning and performance on benchmark tasks.

Contribution

The paper proposes a novel reliability-guaranteed transformer (RT) for MORL that accounts for historical information, providing theoretical guarantees and empirical improvements over existing methods.

Findings

01

RT effectively filters unreliable trajectories based on cumulative reliability.

02

RT achieves higher returns compared to state-of-the-art MORL methods.

03

Theoretical performance guarantees support RT's reliability in policy learning.

Abstract

Model-based offline reinforcement learning (MORL) aims to learn a policy by exploiting a dynamics model derived from an existing dataset. Applying conservative quantification to the dynamics model, most existing works on MORL generate trajectories that approximate the real data distribution to facilitate policy learning by using current information (e.g., the state and action at time step $t$ ). However, these works neglect the impact of historical information on environmental dynamics, leading to the generation of unreliable trajectories that may not align with the real data distribution. In this paper, we propose a new MORL algorithm \textbf{R}eliability-guaranteed \textbf{T}ransformer (RT), which can eliminate unreliable trajectories by calculating the cumulative reliability of the generated trajectory (i.e., using a weighted variational distance away from the real data). Moreover, by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Elevator Systems and Control

MethodsALIGN