Safe Evaluation For Offline Learning: Are We Ready To Deploy?
Hager Radi, Josiah P. Hanna, Peter Stone, Matthew E. Taylor

TL;DR
This paper proposes a framework for safely evaluating offline reinforcement learning policies using high-confidence off-policy evaluation, enabling deployment decisions without risking unsafe real-world interactions.
Contribution
It introduces a method to estimate lower-bound performance of offline policies using bootstrapped off-policy evaluation, enhancing safety in RL deployment.
Findings
The framework provides reliable lower-bound estimates of policy performance.
It enables safe decision-making before deploying learned policies.
The approach reduces the risk of overestimating policy effectiveness.
Abstract
The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned agent will perform if deployed in the target environment before actually deploying it and without the risk of overestimating its true performance. To achieve this, we introduce a framework for safe evaluation of offline learning using approximate high-confidence off-policy evaluation (HCOPE) to estimate the performance of offline policies during learning. In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Machine Learning and Data Classification
