PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning
Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Ya-Qin, Zhang

TL;DR
PROTO is a novel offline-to-online reinforcement learning framework that iteratively evolves regularization to improve stability, adaptability, and efficiency in policy finetuning, outperforming state-of-the-art methods.
Contribution
PROTO introduces an iterative regularization approach with trust-region updates, enhancing offline-to-online RL with minimal computational overhead and broad compatibility.
Findings
PROTO achieves superior performance over SOTA baselines.
It offers high adaptability to diverse offline pretraining methods.
Enables efficient online finetuning with minimal code changes.
Abstract
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance. However, existing methods, effective as they are, suffer from suboptimal performance, limited adaptability, and unsatisfactory computational efficiency. We propose a novel framework, PROTO, which overcomes the aforementioned limitations by augmenting the standard RL objective with an iteratively evolving regularization term. Performing a trust-region-style update, PROTO yields stable initial finetuning and optimal final performance by gradually evolving the regularization term to relax the constraint strength. By adjusting only a few lines of code, PROTO can bridge any offline policy pretraining and standard off-policy RL finetuning to form a powerful offline-to-online RL pathway, birthing great adaptability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
