Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning
Nandiraju Gireesh, Yuanliang Ju, He Wang

TL;DR
This paper introduces Adaptive Q-Chunking (AQC), a novel method for offline-to-online reinforcement learning that dynamically selects chunk sizes for better control and credit assignment, outperforming fixed-size approaches.
Contribution
AQC adaptively compares multiple chunk sizes using advantage normalization, addressing bias issues and improving planning in reinforcement learning.
Findings
AQC achieves state-of-the-art results on OGBench and Robomimic.
AQC enhances large-scale VLA models for RoboCasa-GR1 tasks.
Theoretical bounds demonstrate AQC's noise immunity and value dominance.
Abstract
Offline-to-online reinforcement learning with action chunking eliminates multi-step off-policy bias and enables temporally coherent exploration, but all existing methods use a fixed chunk size across every state. This is suboptimal: near contact events the agent needs short chunks for reactive control, while during free-space motion long chunks provide better credit assignment. The natural solution is to train critics for several chunk sizes and select the best one at each state, but naive comparison of learned critic values systematically collapses to the shortest chunk due to discount-scale mismatch, and degrades to noise in low-value states. We propose Adaptive Q-Chunking (AQC), which resolves both failures by comparing the advantage of each chunk size relative to a per-horizon baseline, normalized by the discount factor. This criterion converts biased wrong answers into unbiased…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
