Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online   Reinforcement Learning

Shenzhi Wang; Qisen Yang; Jiawei Gao; Matthieu Gaetan Lin; Hao Chen,; Liwei Wu; Ning Jia; Shiji Song; Gao Huang

arXiv:2310.17966·cs.LG·October 31, 2023·1 cites

Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

Shenzhi Wang, Qisen Yang, Jiawei Gao, Matthieu Gaetan Lin, Hao Chen,, Liwei Wu, Ning Jia, Shiji Song, Gao Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

FamO2O introduces a framework for offline-to-online reinforcement learning that adaptively balances policy improvement and constraints at the state level, leading to better performance by leveraging data quality variations.

Contribution

It proposes a novel family of policies and a balance model for state-adaptive policy improvement, enhancing offline-to-online RL performance.

Findings

01

Achieves state-of-the-art results on D4RL benchmark.

02

Statistically significant improvements over existing methods.

03

Theoretically proves the necessity of state-adaptive balances.

Abstract

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states. To this end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O utilizes a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leaplabthu/famo2o
jaxOfficial

Videos

Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Advanced Bandit Algorithms Research