Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning
Zhiqiang Dong, Teng Pang, Rongjian Xu, Guoqiang Wu

TL;DR
This paper introduces a novel hierarchical policy model with a mean flow approach and a discriminative goal embedding loss, significantly improving offline goal-conditioned reinforcement learning performance.
Contribution
It proposes the goal-conditioned mean flow policy and LeJEPA loss to enhance expressiveness and goal representation in offline GCRL, addressing existing limitations.
Findings
Achieves strong performance on OGBench benchmark tasks.
Effectively models complex target distributions with the mean flow policy.
Improves goal representation discrimination and generalization.
Abstract
Offline goal-conditioned reinforcement learning (GCRL) is a practical reinforcement learning paradigm that aims to learn goal-conditioned policies from reward-free offline data. Despite recent advances in hierarchical architectures such as HIQL, long-horizon control in offline GCRL remains challenging due to the limited expressiveness of Gaussian policies and the inability of high-level policies to generate effective subgoals. To address these limitations, we propose the goal-conditioned mean flow policy, which introduces an average velocity field into hierarchical policy modeling for offline GCRL. Specifically, the mean flow policy captures complex target distributions for both high-level and low-level policies through a learned average velocity field, enabling efficient action generation via one-step sampling. Furthermore, considering the insufficiency of goal representation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
