Combining Cost-Constrained Runtime Monitors for AI Safety

Tim Tian Hua; James Baskerville; Henri Lemoine; Mia Hopman; Aryan Bhatt; Tyler Tracy

arXiv:2507.15886·cs.CY·October 22, 2025

Combining Cost-Constrained Runtime Monitors for AI Safety

Tim Tian Hua, James Baskerville, Henri Lemoine, Mia Hopman, Aryan Bhatt, Tyler Tracy

PDF

TL;DR

This paper presents a method to optimally combine multiple runtime monitors for AI safety, maximizing detection recall while respecting cost constraints, demonstrated by significant improvements in a code review scenario.

Contribution

The paper introduces an algorithm that strategically combines monitors and allocates interventions, improving detection recall under budget constraints.

Findings

01

More than doubled recall rate compared to baseline

02

Combining two monitors can Pareto dominate individual monitors

03

Framework effectively balances detection and cost in AI safety monitoring

Abstract

Monitoring AIs at runtime can help us detect and stop harmful actions. In this paper, we study how to efficiently combine multiple runtime monitors into a single monitoring protocol. The protocol's objective is to maximize the probability of applying a safety intervention on misaligned outputs (i.e., maximize recall). Since running monitors and applying safety interventions are costly, the protocol also needs to adhere to an average-case budget constraint. Taking the monitors' performance and cost as given, we develop an algorithm to find the best protocol. The algorithm exhaustively searches over when and which monitors to call, and allocates safety interventions based on the Neyman-Pearson lemma. By focusing on likelihood ratios and strategically trading off spending on monitors against spending on interventions, we more than double our recall rate compared to a naive baseline in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.