Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Hao Hu, Yiqin Yang, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu,, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang

TL;DR
This paper introduces a Bayesian approach for offline-to-online reinforcement learning, balancing optimism and pessimism to improve policy fine-tuning and avoid performance drops.
Contribution
It proposes a belief-matching principle for offline-to-online RL, backed by theory and a novel algorithm that outperforms existing methods.
Findings
The Bayesian belief-matching approach prevents performance drops during fine-tuning.
The proposed algorithm outperforms existing methods on benchmark tasks.
Theoretical analysis supports the effectiveness of probability-matching in RL.
Abstract
Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Auction Theory and Applications
