Closing the Gap between TD Learning and Supervised Learning with $Q$-Conditioned Maximization

Xing Lei; Zifeng Zhuang; Shentao Yang; Sheng Xu; Yunhao Luo; Fei Shen; Wenyan Yang; Xuetao Zhang; Donglin Wang

arXiv:2506.00795·cs.LG·September 12, 2025

Closing the Gap between TD Learning and Supervised Learning with $Q$-Conditioned Maximization

Xing Lei, Zifeng Zhuang, Shentao Yang, Sheng Xu, Yunhao Luo, Fei Shen, Wenyan Yang, Xuetao Zhang, Donglin Wang

PDF

Open Access

TL;DR

This paper introduces a novel supervised learning method for offline goal-conditioned reinforcement learning that incorporates Q-conditioned maximization to enable trajectory stitching, bridging the performance gap with traditional TD-based methods.

Contribution

The paper proposes GCReinSL, a new approach combining Q-function estimation via Normalizing Flows and Q-maximization with Expectile Regression to enhance supervised learning with stitching capabilities.

Findings

01

Outperforms prior SL methods with stitching capabilities.

02

Effectively estimates Q-functions from offline datasets.

03

Achieves better goal data augmentation results.

Abstract

Recently, supervised learning (SL) methodology has emerged as an effective approach for offline reinforcement learning (RL) due to their simplicity, stability, and efficiency. However, recent studies show that SL methods lack the trajectory stitching capability, typically associated with temporal difference (TD)-based approaches. A question naturally surfaces: \textit{How can we endow SL methods with stitching capability and close its performance gap with TD learning?} To answer this question, we introduce $Q$ -conditioned maximization supervised learning for offline goal-conditioned RL, which enhances SL with the stitching capability through $Q$ -conditioned policy and $Q$ -conditioned maximization. Concretely, we propose \textbf{G}oal-\textbf{C}onditioned \textbf{\textit{Rein}}forced \textbf{S}upervised \textbf{L}earning (\textbf{GC\textit{Rein}SL}), which consists of (1) estimating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning