Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song, J. Andrew Bagnell, Aarti Singh

TL;DR
This paper introduces a novel hybrid reinforcement learning framework that utilizes offline observation-only data and online interaction, addressing challenges in data admissibility and demonstrating theoretical and practical effectiveness.
Contribution
It proposes the first algorithm for trace model settings in hybrid RL that matches reset model algorithms under admissibility assumptions.
Findings
Algorithm matches reset model performance under admissibility.
Proof-of-concept experiments show practical effectiveness.
Addresses offline data limitations in hybrid RL.
Abstract
We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. While Reinforcement Learning (RL) research typically assumes offline data contains complete action, reward and transition information, datasets with only state information (also known as observation-only datasets) are more general, abundant and practical. This motivates our study of the hybrid RL with observation-only offline dataset framework. While the task of competing with the best policy "covered" by the offline data can be solved if a reset model of the environment is provided (i.e., one that can be reset to any state), we show evidence of hardness when only given the weaker trace model (i.e., one can only reset to the initial states and must produce full traces through the environment), without further assumption of admissibility of the offline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Applications
