Monitoring Risks in Test-Time Adaptation

Mona Schirmer; Metod Jazbec; Christian A. Naesseth; Eric Nalisnick

arXiv:2507.08721·cs.LG·November 11, 2025

Monitoring Risks in Test-Time Adaptation

Mona Schirmer, Metod Jazbec, Christian A. Naesseth, Eric Nalisnick

PDF

TL;DR

This paper introduces a risk monitoring framework for test-time adaptation that tracks model performance without labels, enabling early detection of model degradation during deployment under data shifts.

Contribution

It extends existing statistical monitoring tools to work with continuously adapting models without labeled data, enhancing deployment safety.

Findings

01

Effective detection of model failure points across datasets

02

Robust monitoring under various distribution shifts

03

Compatibility with multiple TTA methods

Abstract

Encountering shifted data at test time is a ubiquitous challenge when deploying predictive models. Test-time adaptation (TTA) methods address this issue by continuously adapting a deployed model using only unlabeled test data. While TTA can extend the model's lifespan, it is only a temporary solution. Eventually the model might degrade to the point that it must be taken offline and retrained. To detect such points of ultimate failure, we propose pairing TTA with risk monitoring frameworks that track predictive performance and raise alerts when predefined performance criteria are violated. Specifically, we extend existing monitoring tools based on sequential testing with confidence sequences to accommodate scenarios in which the model is updated at test time and no test labels are available to estimate the performance metrics of interest. Our extensions unlock the application of rigorous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.