What Makes Value Learning Efficient in Residual Reinforcement Learning?

Guozheng Ma; Lu Li; Haoyu Wang; Zixuan Liu; Pierre-Luc Bacon; Dacheng Tao

arXiv:2602.10539·cs.LG·February 12, 2026

What Makes Value Learning Efficient in Residual Reinforcement Learning?

Guozheng Ma, Lu Li, Haoyu Wang, Zixuan Liu, Pierre-Luc Bacon, Dacheng Tao

PDF

Open Access

TL;DR

This paper investigates the challenges of value learning in residual reinforcement learning, identifying key bottlenecks and proposing DAWN, a simple method that significantly improves learning efficiency across various benchmarks.

Contribution

The paper uncovers fundamental bottlenecks in residual RL value learning and introduces DAWN, a minimal approach with effective solutions for efficient training.

Findings

01

DAWN improves value learning efficiency across multiple benchmarks.

02

Base-policy transitions act as value anchors for warmup.

03

Critic normalization restores representation sensitivity.

Abstract

Residual reinforcement learning (RL) enables stable online refinement of expressive pretrained policies by freezing the base and learning only bounded corrections. However, value learning in residual RL poses unique challenges that remain poorly understood. In this work, we identify two key bottlenecks: cold start pathology, where the critic lacks knowledge of the value landscape around the base policy, and structural scale mismatch, where the residual contribution is dwarfed by the base action. Through systematic investigation, we uncover the mechanisms underlying these bottlenecks, revealing that simple yet principled solutions suffice: base-policy transitions serve as an essential value anchor for implicit warmup, and critic normalization effectively restores representation sensitivity for discerning value differences. Based on these insights, we propose DAWN (Data-Anchored Warmup…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)