Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event Sampling
Mengdi Xu, Peide Huang, Fengpei Li, Jiacheng Zhu, Xuewei Qi, Kentaro, Oguchi, Zhiyuan Huang, Henry Lam, Ding Zhao

TL;DR
This paper introduces the Accelerated Policy Evaluation (APE) method for efficiently estimating rare event probabilities in reinforcement learning, improving scalability and accuracy in safety-critical systems.
Contribution
The paper presents APE, a novel approach that uses adaptive importance sampling and adversarial environment modeling to efficiently evaluate rare events in large or continuous spaces.
Findings
APE estimates rare event probabilities with less bias.
APE requires significantly fewer samples than baseline methods.
Empirical results demonstrate APE's scalability and effectiveness.
Abstract
Evaluating rare but high-stakes events is one of the main challenges in obtaining reliable reinforcement learning policies, especially in large or infinite state/action spaces where limited scalability dictates a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. This paper proposes the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software Reliability and Analysis Research · Reinforcement Learning in Robotics
