Understanding Hindsight Goal Relabeling from a Divergence Minimization Perspective
Lunjun Zhang, Bradly C. Stadie

TL;DR
This paper offers a new perspective on hindsight goal relabeling in multi-goal reinforcement learning by framing it as divergence minimization within imitation learning, leading to insights and improved algorithms.
Contribution
It introduces a divergence minimization framework for understanding hindsight goal relabeling, connecting it to imitation learning and deriving new insights for goal-reaching algorithms.
Findings
Q-learning outperforms behavioral cloning under hindsight relabeling
Selective application of behavioral cloning improves performance
Reward design significantly impacts goal-reaching success
Abstract
Hindsight goal relabeling has become a foundational technique in multi-goal reinforcement learning (RL). The essential idea is that any trajectory can be seen as a sub-optimal demonstration for reaching its final state. Intuitively, learning from those arbitrary demonstrations can be seen as a form of imitation learning (IL). However, the connection between hindsight goal relabeling and imitation learning is not well understood. In this paper, we propose a novel framework to understand hindsight goal relabeling from a divergence minimization perspective. Recasting the goal reaching problem in the IL framework not only allows us to derive several existing methods from first principles, but also provides us with the tools from IL to improve goal reaching algorithms. Experimentally, we find that under hindsight relabeling, Q-learning outperforms behavioral cloning (BC). Yet, a vanilla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Neural and Behavioral Psychology Studies
MethodsQ-Learning · Experience Replay
