PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer
Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre,, Sungjin Ahn

TL;DR
PlanDQ introduces a hierarchical offline RL framework combining a diffusion-based high-level planner with a Q-learning low-level policy, achieving strong performance on diverse long-horizon tasks.
Contribution
It presents a novel hierarchical offline RL algorithm, PlanDQ, integrating D-Conductor and Q-Performer to improve long-horizon task performance.
Findings
Achieves superior or competitive results on D4RL benchmarks.
Performs well on long-horizon tasks like AntMaze, Kitchen, and Calvin.
Demonstrates effectiveness of hierarchical planning in offline RL.
Abstract
Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that can perform well in long-horizon tasks are designed specifically for goal-conditioned tasks, which commonly perform worse than value function learning methods on short-horizon, dense-reward scenarios. To bridge this gap, we propose a hierarchical planner designed for offline RL called PlanDQ. PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals. At the low level, we used a Q-learning based approach called the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Model-Driven Software Engineering Techniques · AI-based Problem Solving and Planning
MethodsQ-Learning
