Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event   Sampling

Mengdi Xu; Peide Huang; Fengpei Li; Jiacheng Zhu; Xuewei Qi; Kentaro; Oguchi; Zhiyuan Huang; Henry Lam; Ding Zhao

arXiv:2106.10566·cs.LG·October 4, 2022·1 cites

Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event Sampling

Mengdi Xu, Peide Huang, Fengpei Li, Jiacheng Zhu, Xuewei Qi, Kentaro, Oguchi, Zhiyuan Huang, Henry Lam, Ding Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Accelerated Policy Evaluation (APE) method for efficiently estimating rare event probabilities in reinforcement learning, improving scalability and accuracy in safety-critical systems.

Contribution

The paper presents APE, a novel approach that uses adaptive importance sampling and adversarial environment modeling to efficiently evaluate rare events in large or continuous spaces.

Findings

01

APE estimates rare event probabilities with less bias.

02

APE requires significantly fewer samples than baseline methods.

03

Empirical results demonstrate APE's scalability and effectiveness.

Abstract

Evaluating rare but high-stakes events is one of the main challenges in obtaining reliable reinforcement learning policies, especially in large or infinite state/action spaces where limited scalability dictates a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. This paper proposes the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eleurent/highway-env
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Software Reliability and Analysis Research · Reinforcement Learning in Robotics