An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
Jianhai Su, Jinzhu Luo, Qi Zhang

TL;DR
This paper explores integrating offline reinforcement learning algorithms as subroutines within online RL to improve learning efficiency, revealing task-dependent effectiveness and the need for better fine-tuning methods.
Contribution
It formalizes a framework for offline RL integration into online RL and introduces techniques to enhance its effectiveness, supported by extensive empirical analysis.
Findings
Effectiveness varies with task nature
Proposed techniques significantly improve performance
Existing online fine-tuning methods are generally ineffective
Abstract
We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL. This is feasible because an online learning agent can repurpose its historical interactions as offline dataset. We formalize this idea into a framework that accommodates several variants of offline RL incorporation such as final policy recommendation and online fine-tuning. We further introduce convenient techniques to improve its effectiveness in enhancing online learning efficiency. Our extensive and systematic empirical analyses show that 1) the effectiveness of the proposed framework depends strongly on the nature of the task, 2) our proposed techniques greatly enhance its effectiveness, and 3) existing online fine-tuning methods are overall ineffective, calling for more research therein.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
