Closing the Gap between TD Learning and Supervised Learning with $Q$-Conditioned Maximization
Xing Lei, Zifeng Zhuang, Shentao Yang, Sheng Xu, Yunhao Luo, Fei Shen, Wenyan Yang, Xuetao Zhang, Donglin Wang

TL;DR
This paper introduces a novel supervised learning method for offline goal-conditioned reinforcement learning that incorporates Q-conditioned maximization to enable trajectory stitching, bridging the performance gap with traditional TD-based methods.
Contribution
The paper proposes GCReinSL, a new approach combining Q-function estimation via Normalizing Flows and Q-maximization with Expectile Regression to enhance supervised learning with stitching capabilities.
Findings
Outperforms prior SL methods with stitching capabilities.
Effectively estimates Q-functions from offline datasets.
Achieves better goal data augmentation results.
Abstract
Recently, supervised learning (SL) methodology has emerged as an effective approach for offline reinforcement learning (RL) due to their simplicity, stability, and efficiency. However, recent studies show that SL methods lack the trajectory stitching capability, typically associated with temporal difference (TD)-based approaches. A question naturally surfaces: \textit{How can we endow SL methods with stitching capability and close its performance gap with TD learning?} To answer this question, we introduce -conditioned maximization supervised learning for offline goal-conditioned RL, which enhances SL with the stitching capability through -conditioned policy and -conditioned maximization. Concretely, we propose \textbf{G}oal-\textbf{C}onditioned \textbf{\textit{Rein}}forced \textbf{S}upervised \textbf{L}earning (\textbf{GC\textit{Rein}SL}), which consists of (1) estimating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
