Efficient Policy Evaluation with Safety Constraint for Reinforcement   Learning

Claire Chen; Shuze Daniel Liu; Shangtong Zhang

arXiv:2410.05655·cs.LG·March 21, 2025

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

Claire Chen, Shuze Daniel Liu, Shangtong Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel reinforcement learning evaluation method that minimizes variance while guaranteeing safety constraints, outperforming existing approaches in both safety and accuracy.

Contribution

It proposes an optimal variance-minimizing behavior policy under safety constraints, ensuring unbiased evaluation with lower variance and safety guarantees.

Findings

01

Achieves significant variance reduction compared to on-policy evaluation.

02

Ensures safety constraints are satisfied during evaluation.

03

Outperforms previous methods in both variance reduction and safety.

Abstract

In reinforcement learning, classic on-policy evaluation methods often suffer from high variance and require massive online data to attain the desired accuracy. Previous studies attempt to reduce evaluation variance by searching for or designing proper behavior policies to collect data. However, these approaches ignore the safety of such behavior policies -- the designed behavior policies have no safety guarantee and may lead to severe damage during online executions. In this paper, to address the challenge of reducing variance while ensuring safety simultaneously, we propose an optimal variance-minimizing behavior policy under safety constraints. Theoretically, while ensuring safety constraints, our evaluation method is unbiased and has lower variance than on-policy evaluation. Empirically, our method is the only existing method to achieve both substantial variance reduction and safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning· slideslive

Taxonomy

TopicsSoftware Reliability and Analysis Research · Safety Systems Engineering in Autonomy · Elevator Systems and Control