Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning

Hongjoon Ahn; Heewoong Choi; Jisu Han; and Taesup Moon

arXiv:2505.12737·cs.LG·November 5, 2025

Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning

Hongjoon Ahn, Heewoong Choi, Jisu Han, and Taesup Moon

PDF

Open Access 1 Video

TL;DR

This paper introduces OTA, a value learning method that incorporates temporal abstraction to improve high-level policy learning in offline goal-conditioned reinforcement learning, especially for long-horizon tasks.

Contribution

The paper proposes Option-aware Temporally Abstracted value learning (OTA), a novel approach that enhances value function estimates by integrating temporal abstraction, addressing long-horizon challenges in offline GCRL.

Findings

01

OTA improves high-level policy performance on complex tasks.

02

OTA contracts the effective horizon length, enabling better advantage estimates.

03

Experimental results show strong performance on OGBench tasks.

Abstract

Offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm in which goal-reaching policies are trained from abundant state-action trajectory datasets without additional environment interaction. However, offline GCRL still struggles with long-horizon tasks, even with recent advances that employ hierarchical policy structures, such as HIQL. Identifying the root cause of this challenge, we observe the following insight. Firstly, performance bottlenecks mainly stem from the high-level policy's inability to generate appropriate subgoals. Secondly, when learning the high-level policy in the long-horizon regime, the sign of the advantage estimate frequently becomes incorrect. Thus, we argue that improving the value function to produce a clear advantage estimate for learning the high-level policy is essential. In this paper, we propose a simple yet effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning