IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
Rohan Chitnis, Yingchen Xu, Bobak Hashemi, Lucas Lehnert, Urun Dogan,, Zheqing Zhu, Olivier Delalleau

TL;DR
This paper introduces IQL-TD-MPC, a hierarchical offline model-based RL method that improves long-horizon sparse-reward task performance by planning with implicit Q-learning and intent embeddings.
Contribution
It extends TD-MPC with IQL for better long-term planning and proposes a hierarchical framework using IQL-TD-MPC as a manager to enhance offline RL performance.
Findings
Significant performance improvements on D4RL benchmarks.
Hierarchical approach with intent embeddings boosts offline RL algorithms.
Achieves high scores where baseline methods fail.
Abstract
Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset. We hypothesize that model-based RL agents struggle in these environments due to a lack of long-term planning capabilities, and that planning in a temporally abstract model of the environment can alleviate this issue. In this paper, we make two key contributions: 1) we introduce an offline model-based RL algorithm, IQL-TD-MPC, that extends the state-of-the-art Temporal Difference Learning for Model Predictive Control (TD-MPC) with Implicit Q-Learning (IQL); 2) we propose to use IQL-TD-MPC as a Manager in a hierarchical setting with any off-the-shelf offline RL algorithm as a Worker. More specifically, we pre-train a temporally abstract IQL-TD-MPC Manager to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques
MethodsQ-Learning
