Tempora: Characterising the Time-Contingent Utility of Online Test-Time Adaptation
Sudarshan Sreeram, Young D. Kwon, and Cecilia Mascolo

TL;DR
Tempora introduces a comprehensive framework for evaluating online test-time adaptation methods under real-world temporal constraints, revealing that traditional rankings often do not hold under latency-sensitive scenarios.
Contribution
The paper presents Tempora, a novel evaluation framework incorporating temporal scenarios, protocols, and utility metrics to assess TTA methods under deployment-like conditions.
Findings
Conventional rankings often do not predict performance under temporal pressure.
ETA, a leading TTA method, underperforms in 41.2% of temporal evaluations.
The best TTA method varies with corruption type and latency constraints.
Abstract
Test-time adaptation (TTA) offers a compelling remedy for machine learning (ML) models that degrade under domain shifts, improving generalisation on-the-fly with only unlabelled samples. This flexibility suits real deployments, yet conventional evaluations unrealistically assume unbounded processing time, overlooking the accuracy-latency trade-off. As ML increasingly underpins latency-sensitive and user-facing use-cases, temporal pressure constrains the viability of adaptable inference; predictions arriving too late to act on are futile. We introduce Tempora, a framework for evaluating TTA under this pressure. It consists of temporal scenarios that model deployment constraints, evaluation protocols that operationalise measurement, and time-contingent utility metrics that quantify the accuracy-latency trade-off. We instantiate the framework with three such metrics: (1) discrete utility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
