Reliably Detecting Model Failures in Deployment Without Labels

Viet Nguyen; Changjian Shui; Vijay Giri; Siddharth Arya; Amol Verma; Fahad Razak; Rahul G. Krishnan

arXiv:2506.05047·cs.LG·November 5, 2025

Reliably Detecting Model Failures in Deployment Without Labels

Viet Nguyen, Changjian Shui, Vijay Giri, Siddharth Arya, Amol Verma, Fahad Razak, Rahul G. Krishnan

PDF

1 Repo 1 Video

TL;DR

This paper introduces D3M, a label-free monitoring method that detects model performance deterioration in deployment by analyzing predictive disagreement, effectively alerting for necessary retraining in dynamic environments.

Contribution

It formalizes the post-deployment deterioration detection problem and proposes D3M, a novel, practical algorithm with theoretical guarantees and empirical validation.

Findings

01

Low false positive rates under non-deteriorating shifts

02

High true positive detection with sample complexity bounds

03

Effective on benchmark and real-world datasets

Abstract

The distribution of data changes over time; models operating in dynamic environments need retraining. But knowing when to retrain, without access to labels, is an open challenge since some, but not all shifts degrade model performance. This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring. We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models, achieving low false positive rates under non-deteriorating shifts and provides sample complexity bounds for high true positive rates under deteriorating shifts. Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework and highlight its viability as an alert mechanism for high-stakes machine learning pipelines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

teivng/d3m
pytorchOfficial

Videos

Reliably detecting model failures in deployment without labels· slideslive