Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
Hongjoon Ahn, Heewoong Choi, Jisu Han, and Taesup Moon

TL;DR
This paper introduces OTA, a value learning method that incorporates temporal abstraction to improve high-level policy learning in offline goal-conditioned reinforcement learning, especially for long-horizon tasks.
Contribution
The paper proposes Option-aware Temporally Abstracted value learning (OTA), a novel approach that enhances value function estimates by integrating temporal abstraction, addressing long-horizon challenges in offline GCRL.
Findings
OTA improves high-level policy performance on complex tasks.
OTA contracts the effective horizon length, enabling better advantage estimates.
Experimental results show strong performance on OGBench tasks.
Abstract
Offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm in which goal-reaching policies are trained from abundant state-action trajectory datasets without additional environment interaction. However, offline GCRL still struggles with long-horizon tasks, even with recent advances that employ hierarchical policy structures, such as HIQL. Identifying the root cause of this challenge, we observe the following insight. Firstly, performance bottlenecks mainly stem from the high-level policy's inability to generate appropriate subgoals. Secondly, when learning the high-level policy in the long-horizon regime, the sign of the advantage estimate frequently becomes incorrect. Thus, we argue that improving the value function to produce a clear advantage estimate for learning the high-level policy is essential. In this paper, we propose a simple yet effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
