Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty   and Smoothness

Xiaoyu Wen; Xudong Yu; Rui Yang; Haoyuan Chen; Chenjia Bai; Zhen Wang

arXiv:2309.16973·cs.LG·November 18, 2024·1 cites

Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness

Xiaoyu Wen, Xudong Yu, Rui Yang, Haoyuan Chen, Chenjia Bai, Zhen Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces RO2O, a robust algorithm for offline-to-online reinforcement learning that uses uncertainty and smoothness techniques to improve policy stability and performance during online adaptation.

Contribution

The paper proposes the RO2O algorithm, which enhances offline policies for better online adaptation by integrating uncertainty penalties and smoothness constraints without altering the core learning objective.

Findings

01

RO2O achieves more stable policy improvement in O2O RL.

02

RO2O outperforms baseline methods with limited online interactions.

03

Theoretical analysis shows tighter bounds under distribution shift.

Abstract

To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a promising approach involves the combination of offline RL, which enhances sample efficiency by leveraging offline datasets, and online RL, which explores informative transitions by interacting with the environment. Offline-to-Online (O2O) RL provides a paradigm for improving an offline trained agent within limited online interactions. However, due to the significant distribution shift between online experiences and offline data, most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in O2O adaptation. To address this problem, we propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation. Specifically, RO2O incorporates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

battlewen/ro2o
pytorchOfficial

Videos

Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research