Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL

Lipeng Zu; Hansong Zhou; Xiaonan Zhang

arXiv:2511.03695·cs.LG·November 6, 2025

Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL

Lipeng Zu, Hansong Zhou, Xiaonan Zhang

PDF

Open Access

TL;DR

This paper introduces Behavior-Adaptive Q-Learning (BAQ), a framework that facilitates a stable and efficient transition from offline to online reinforcement learning by leveraging implicit behavioral models and adaptive constraints.

Contribution

BAQ is a novel framework that uses implicit behavioral models and dual-objective loss to improve offline-to-online RL transition stability and performance.

Findings

01

BAQ outperforms prior methods on standard benchmarks.

02

BAQ achieves faster recovery and higher robustness.

03

BAQ stabilizes early online updates and accelerates adaptation.

Abstract

Offline reinforcement learning (RL) enables training from fixed data without online interaction, but policies learned offline often struggle when deployed in dynamic environments due to distributional shift and unreliable value estimates on unseen state-action pairs. We introduce Behavior-Adaptive Q-Learning (BAQ), a framework designed to enable a smooth and reliable transition from offline to online RL. The key idea is to leverage an implicit behavioral model derived from offline data to provide a behavior-consistency signal during online fine-tuning. BAQ incorporates a dual-objective loss that (i) aligns the online policy toward the offline behavior when uncertainty is high, and (ii) gradually relaxes this constraint as more confident online experience is accumulated. This adaptive mechanism reduces error propagation from out-of-distribution estimates, stabilizes early online updates,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research