Chunk-Guided Q-Learning
Gwanwoo Song, Kwanyoung Park, Youngwoon Lee

TL;DR
Chunk-Guided Q-Learning (CGQ) is a novel offline RL algorithm that balances long-term credit assignment and policy optimality by combining single-step and chunk-based critics, improving performance on long-horizon tasks.
Contribution
CGQ introduces a regularization technique guiding a single-step critic with a chunk-based critic, achieving tighter optimality bounds and better long-horizon performance.
Findings
CGQ outperforms single-step and chunked methods on long-horizon benchmarks.
Theoretically, CGQ provides tighter critic optimality bounds.
Empirically, CGQ demonstrates strong results on OGBench tasks.
Abstract
In offline reinforcement learning (RL), single-step temporal-difference (TD) learning can suffer from bootstrapping error accumulation over long horizons. Action-chunked TD methods mitigate this by backing up over multiple steps, but can introduce suboptimality by restricting the policy class to open-loop action sequences. To resolve this trade-off, we present Chunk-Guided Q-Learning (CGQ), a single-step TD algorithm that guides a fine-grained single-step critic by regularizing it toward a chunk-based critic trained using temporally extended backups. This reduces compounding error while preserving fine-grained value propagation. We theoretically show that CGQ attains tighter critic optimality bounds than either single-step or action-chunked TD learning alone. Empirically, CGQ achieves strong performance on challenging long-horizon OGBench tasks, often outperforming both single-step and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
