Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
Xiaoyu Wen, Xudong Yu, Rui Yang, Haoyuan Chen, Chenjia Bai, Zhen Wang

TL;DR
This paper introduces RO2O, a robust algorithm for offline-to-online reinforcement learning that uses uncertainty and smoothness techniques to improve policy stability and performance during online adaptation.
Contribution
The paper proposes the RO2O algorithm, which enhances offline policies for better online adaptation by integrating uncertainty penalties and smoothness constraints without altering the core learning objective.
Findings
RO2O achieves more stable policy improvement in O2O RL.
RO2O outperforms baseline methods with limited online interactions.
Theoretical analysis shows tighter bounds under distribution shift.
Abstract
To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a promising approach involves the combination of offline RL, which enhances sample efficiency by leveraging offline datasets, and online RL, which explores informative transitions by interacting with the environment. Offline-to-Online (O2O) RL provides a paradigm for improving an offline trained agent within limited online interactions. However, due to the significant distribution shift between online experiences and offline data, most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in O2O adaptation. To address this problem, we propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation. Specifically, RO2O incorporates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research
