From Static Constraints to Dynamic Adaptation: Sample-Level Constraint Relaxation for Offline-to-Online Reinforcement Learning

Lipeng Zu; Yu Qian; Shayok Chakraborty; Xiaonan Zhang

arXiv:2511.03828·cs.LG·May 19, 2026

From Static Constraints to Dynamic Adaptation: Sample-Level Constraint Relaxation for Offline-to-Online Reinforcement Learning

Lipeng Zu, Yu Qian, Shayok Chakraborty, Xiaonan Zhang

PDF

1 Repo

TL;DR

This paper introduces DARE, a novel framework for offline-to-online reinforcement learning that dynamically relaxes constraints based on behavioral consistency, improving fine-tuning stability and performance.

Contribution

DARE is the first method to condition constraint relaxation on behavioral consistency, enabling flexible, sample-level adaptation in offline-to-online RL.

Findings

01

DARE improves fine-tuning stability in D4RL benchmarks.

02

DARE achieves superior final performance over strong baselines.

03

Behavior-based sample exchange enhances offline-online distinction.

Abstract

Offline-to-online reinforcement learning (O2O RL) faces a central challenge between retaining offline conservatism and adapting to online feedback under distribution shift. This challenge arises because data behavior evolves during fine-tuning, rendering data origin a misleading basis for constraint handling and thereby leading to objective-data mismatch. We therefore propose Dynamic Alignment for RElaxation (DARE), a distribution-aware framework for sample-level constraint relaxation based on the behavioral consistency with a behavior model. To our knowledge, DARE is the first to condition constraint relaxation on behavioral consistency via a posterior-induced exchange mechanism, moving beyond a binary offline/online data distinction. Importantly, DARE requires only per-sample behavioral alignment, enabling instantiation on top of many offline algorithms with flexible choices of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lpzu/DARE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.