Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
Jonathan Uesato, Ananya Kumar, Csaba Szepesvari, Tom Erez, Avraham, Ruderman, Keith Anderson, Krishmamurthy (Dj) Dvijotham, Nicolas Heess,, Pushmeet Kohli

TL;DR
This paper introduces an adversarial evaluation method for learning agents in safety-critical domains, effectively identifying catastrophic failures and accurately estimating failure probabilities much faster than traditional methods.
Contribution
It proposes an adversarial evaluation approach that uncovers rare failure scenarios and provides unbiased failure probability estimates, improving safety assessments of learned agents.
Findings
Successfully finds catastrophic failures in humanoid control and driving domains.
Estimates failure rates orders of magnitude faster than standard methods.
Reuses existing training data for efficient evaluation.
Abstract
This paper addresses the problem of evaluating learning systems in safety critical domains such as autonomous driving, where failures can have catastrophic consequences. We focus on two problems: searching for scenarios when learned agents fail and assessing their probability of failure. The standard method for agent evaluation in reinforcement learning, Vanilla Monte Carlo, can miss failures entirely, leading to the deployment of unsafe agents. We demonstrate this is an issue for current agents, where even matching the compute used for training is sometimes insufficient for evaluation. To address this shortcoming, we draw upon the rare event probability estimation literature and propose an adversarial evaluation approach. Our approach focuses evaluation on adversarially chosen situations, while still providing unbiased estimates of failure probabilities. The key difficulty is in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Reinforcement Learning in Robotics
