Unifying Evaluation of Machine Learning Safety Monitors

Joris Guerin; Raul Sena Ferreira; Kevin Delmas; J\'er\'emie; Guiochet

arXiv:2208.14660·cs.LG·September 1, 2022

Unifying Evaluation of Machine Learning Safety Monitors

Joris Guerin, Raul Sena Ferreira, Kevin Delmas, J\'er\'emie, Guiochet

PDF

Open Access

TL;DR

This paper proposes a unified framework with three safety-oriented metrics for evaluating machine learning safety monitors across diverse applications, ensuring consistent and system-aligned assessments.

Contribution

It introduces a formalized set of metrics and evaluation procedures that unify diverse existing methods for assessing ML safety monitors.

Findings

01

Metrics effectively compare different monitors across tasks

02

Evaluation choices significantly influence perceived monitor performance

03

Formal safety assumptions align evaluations with system requirements

Abstract

With the increasing use of Machine Learning (ML) in critical autonomous systems, runtime monitors have been developed to detect prediction errors and keep the system in a safe state during operations. Monitors have been proposed for different applications involving diverse perception tasks and ML models, and specific evaluation procedures and metrics are used for different contexts. This paper introduces three unified safety-oriented metrics, representing the safety benefits of the monitor (Safety Gain), the remaining safety gaps after using it (Residual Hazard), and its negative impact on the system's performance (Availability Cost). To compute these metrics, one requires to define two return functions, representing how a given ML prediction will impact expected future rewards and hazards. Three use-cases (classification, drone landing, and autonomous driving) are used to demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Software Reliability and Analysis Research · Fault Detection and Control Systems