Chunk-Guided Q-Learning

Gwanwoo Song; Kwanyoung Park; Youngwoon Lee

arXiv:2603.13971·cs.LG·March 17, 2026

Chunk-Guided Q-Learning

Gwanwoo Song, Kwanyoung Park, Youngwoon Lee

PDF

Open Access

TL;DR

Chunk-Guided Q-Learning (CGQ) is a novel offline RL algorithm that balances long-term credit assignment and policy optimality by combining single-step and chunk-based critics, improving performance on long-horizon tasks.

Contribution

CGQ introduces a regularization technique guiding a single-step critic with a chunk-based critic, achieving tighter optimality bounds and better long-horizon performance.

Findings

01

CGQ outperforms single-step and chunked methods on long-horizon benchmarks.

02

Theoretically, CGQ provides tighter critic optimality bounds.

03

Empirically, CGQ demonstrates strong results on OGBench tasks.

Abstract

In offline reinforcement learning (RL), single-step temporal-difference (TD) learning can suffer from bootstrapping error accumulation over long horizons. Action-chunked TD methods mitigate this by backing up over multiple steps, but can introduce suboptimality by restricting the policy class to open-loop action sequences. To resolve this trade-off, we present Chunk-Guided Q-Learning (CGQ), a single-step TD algorithm that guides a fine-grained single-step critic by regularizing it toward a chunk-based critic trained using temporally extended backups. This reduces compounding error while preserving fine-grained value propagation. We theoretically show that CGQ attains tighter critic optimality bounds than either single-step or action-chunked TD learning alone. Empirically, CGQ achieves strong performance on challenging long-horizon OGBench tasks, often outperforming both single-step and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning