Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Hao Hu; Yiqin Yang; Jianing Ye; Chengjie Wu; Ziqing Mai; Yujing Hu,; Tangjie Lv; Changjie Fan; Qianchuan Zhao; Chongjie Zhang

arXiv:2405.20984·cs.LG·June 3, 2024

Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Hao Hu, Yiqin Yang, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu,, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Bayesian approach for offline-to-online reinforcement learning, balancing optimism and pessimism to improve policy fine-tuning and avoid performance drops.

Contribution

It proposes a belief-matching principle for offline-to-online RL, backed by theory and a novel algorithm that outperforms existing methods.

Findings

01

The Bayesian belief-matching approach prevents performance drops during fine-tuning.

02

The proposed algorithm outperforms existing methods on benchmark tasks.

03

Theoretical analysis supports the effectiveness of probability-matching in RL.

Abstract

Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YiqinYang/BOORL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Auction Theory and Applications