What Makes Value Learning Efficient in Residual Reinforcement Learning?
Guozheng Ma, Lu Li, Haoyu Wang, Zixuan Liu, Pierre-Luc Bacon, Dacheng Tao

TL;DR
This paper investigates the challenges of value learning in residual reinforcement learning, identifying key bottlenecks and proposing DAWN, a simple method that significantly improves learning efficiency across various benchmarks.
Contribution
The paper uncovers fundamental bottlenecks in residual RL value learning and introduces DAWN, a minimal approach with effective solutions for efficient training.
Findings
DAWN improves value learning efficiency across multiple benchmarks.
Base-policy transitions act as value anchors for warmup.
Critic normalization restores representation sensitivity.
Abstract
Residual reinforcement learning (RL) enables stable online refinement of expressive pretrained policies by freezing the base and learning only bounded corrections. However, value learning in residual RL poses unique challenges that remain poorly understood. In this work, we identify two key bottlenecks: cold start pathology, where the critic lacks knowledge of the value landscape around the base policy, and structural scale mismatch, where the residual contribution is dwarfed by the base action. Through systematic investigation, we uncover the mechanisms underlying these bottlenecks, revealing that simple yet principled solutions suffice: base-policy transitions serve as an essential value anchor for implicit warmup, and critic normalization effectively restores representation sensitivity for discerning value differences. Based on these insights, we propose DAWN (Data-Anchored Warmup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
