Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming
Xing Lei, Xuetao Zhang, Zifeng Zhuang, Donglin Wang

TL;DR
This paper introduces Q-WSL, a novel goal-conditioned reinforcement learning framework that combines dynamic programming with weighted supervised learning to improve performance, stability, and sample efficiency in sparse reward tasks.
Contribution
Q-WSL integrates dynamic programming into goal-conditioned supervised learning, enabling trajectory stitching and overcoming limitations of existing methods.
Findings
Q-WSL outperforms existing goal-conditioned methods in challenging tasks.
Q-WSL demonstrates improved sample efficiency and robustness.
Q-WSL effectively handles environments with stochasticity and binary rewards.
Abstract
A novel class of advanced algorithms, termed Goal-Conditioned Weighted Supervised Learning (GCWSL), has recently emerged to tackle the challenges posed by sparse rewards in goal-conditioned reinforcement learning (RL). GCWSL consistently delivers strong performance across a diverse set of goal-reaching tasks due to its simplicity, effectiveness, and stability. However, GCWSL methods lack a crucial capability known as trajectory stitching, which is essential for learning optimal policies when faced with unseen skills during testing. This limitation becomes particularly pronounced when the replay buffer is predominantly filled with sub-optimal trajectories. In contrast, traditional TD-based RL methods, such as Q-learning, which utilize Dynamic Programming, do not face this issue but often experience instability due to the inherent difficulties in value function approximation. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSparse Evolutionary Training · Q-Learning
