PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

Chang Chen; Junyeob Baek; Fei Deng; Kenji Kawaguchi; Caglar Gulcehre,; Sungjin Ahn

arXiv:2406.06793·cs.LG·June 12, 2024

PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre,, Sungjin Ahn

PDF

Open Access 1 Repo

TL;DR

PlanDQ introduces a hierarchical offline RL framework combining a diffusion-based high-level planner with a Q-learning low-level policy, achieving strong performance on diverse long-horizon tasks.

Contribution

It presents a novel hierarchical offline RL algorithm, PlanDQ, integrating D-Conductor and Q-Performer to improve long-horizon task performance.

Findings

01

Achieves superior or competitive results on D4RL benchmarks.

02

Performs well on long-horizon tasks like AntMaze, Kitchen, and Calvin.

03

Demonstrates effectiveness of hierarchical planning in offline RL.

Abstract

Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that can perform well in long-horizon tasks are designed specifically for goal-conditioned tasks, which commonly perform worse than value function learning methods on short-horizon, dense-reward scenarios. To bridge this gap, we propose a hierarchical planner designed for offline RL called PlanDQ. PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals. At the low level, we used a Q-learning based approach called the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

changchencc/plandq
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Model-Driven Software Engineering Techniques · AI-based Problem Solving and Planning

MethodsQ-Learning