PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement   Learning

Jianxiong Li; Xiao Hu; Haoran Xu; Jingjing Liu; Xianyuan Zhan; Ya-Qin; Zhang

arXiv:2305.15669·cs.LG·May 26, 2023·5 cites

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Ya-Qin, Zhang

PDF

Open Access 1 Repo

TL;DR

PROTO is a novel offline-to-online reinforcement learning framework that iteratively evolves regularization to improve stability, adaptability, and efficiency in policy finetuning, outperforming state-of-the-art methods.

Contribution

PROTO introduces an iterative regularization approach with trust-region updates, enhancing offline-to-online RL with minimal computational overhead and broad compatibility.

Findings

01

PROTO achieves superior performance over SOTA baselines.

02

It offers high adaptability to diverse offline pretraining methods.

03

Enables efficient online finetuning with minimal code changes.

Abstract

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance. However, existing methods, effective as they are, suffer from suboptimal performance, limited adaptability, and unsatisfactory computational efficiency. We propose a novel framework, PROTO, which overcomes the aforementioned limitations by augmenting the standard RL objective with an iteratively evolving regularization term. Performing a trust-region-style update, PROTO yields stable initial finetuning and optimal final performance by gradually evolving the regularization term to relax the constraint strength. By adjusting only a few lines of code, PROTO can bridge any offline policy pretraining and standard off-policy RL finetuning to form a powerful offline-to-online RL pathway, birthing great adaptability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Facebear-ljx/PROTO
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics