Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning

Yue Pei; Hongming Zhang; Chao Gao; Martin M\"uller; Mengxiao Zhu; Hao Sheng; Ziliang Chen; Liang Lin; Haogang Zhu

arXiv:2508.16420·cs.LG·September 30, 2025

Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning

Yue Pei, Hongming Zhang, Chao Gao, Martin M\"uller, Mengxiao Zhu, Hao Sheng, Ziliang Chen, Liang Lin, Haogang Zhu

PDF

Open Access

TL;DR

This paper introduces Doctor, a transformer-based offline RL method that improves alignment between desired and achieved returns by combining supervised learning and value estimation with a double-check mechanism, enhancing control precision.

Contribution

Doctor is a novel offline RL approach that jointly optimizes action prediction and value estimation, with a double-check inference mechanism for better return alignment.

Findings

01

Doctor achieves stronger performance on D4RL benchmarks.

02

It provides more accurate control aligned with target returns.

03

Demonstrates effectiveness across diverse tasks.

Abstract

Offline reinforcement learning (RL) has achieved significant advances in domains such as robotic control, autonomous driving, and medical decision-making. Most existing methods primarily focus on training policies that maximize cumulative returns from a given dataset. However, many real-world applications require precise control over policy performance levels, rather than simply pursuing the best possible return. Reinforcement learning via supervised learning (RvS) frames offline RL as a sequence modeling task, enabling the extraction of diverse policies by conditioning on different desired returns. Yet, existing RvS-based transformers, such as Decision Transformer (DT), struggle to reliably align the actual achieved returns with specified target returns, especially when interpolating within underrepresented returns or extrapolating beyond the dataset. To address this limitation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management · Reinforcement Learning in Robotics · Neural Networks and Reservoir Computing