Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Claire Chen, Shuze Daniel Liu, Shangtong Zhang

TL;DR
This paper introduces a novel reinforcement learning evaluation method that minimizes variance while guaranteeing safety constraints, outperforming existing approaches in both safety and accuracy.
Contribution
It proposes an optimal variance-minimizing behavior policy under safety constraints, ensuring unbiased evaluation with lower variance and safety guarantees.
Findings
Achieves significant variance reduction compared to on-policy evaluation.
Ensures safety constraints are satisfied during evaluation.
Outperforms previous methods in both variance reduction and safety.
Abstract
In reinforcement learning, classic on-policy evaluation methods often suffer from high variance and require massive online data to attain the desired accuracy. Previous studies attempt to reduce evaluation variance by searching for or designing proper behavior policies to collect data. However, these approaches ignore the safety of such behavior policies -- the designed behavior policies have no safety guarantee and may lead to severe damage during online executions. In this paper, to address the challenge of reducing variance while ensuring safety simultaneously, we propose an optimal variance-minimizing behavior policy under safety constraints. Theoretically, while ensuring safety constraints, our evaluation method is unbiased and has lower variance than on-policy evaluation. Empirically, our method is the only existing method to achieve both substantial variance reduction and safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Reliability and Analysis Research · Safety Systems Engineering in Autonomy · Elevator Systems and Control
