Temporal Test-Time Adaptation with State-Space Models
Mona Schirmer, Dan Zhang, Eric Nalisnick

TL;DR
This paper introduces STAD, a Bayesian filtering approach for test-time adaptation that effectively handles gradually evolving distribution shifts over time without requiring labels.
Contribution
It proposes a novel method that models temporal distribution shifts using state-space models, improving adaptation in real-world scenarios with evolving data.
Findings
Outperforms existing methods on real-world temporal shifts
Handles small batch sizes effectively
Manages label shift without labels
Abstract
Distribution shifts between training and test data are inevitable over the lifecycle of a deployed model, leading to performance decay. Adapting a model on test samples can help mitigate this drop in performance. However, most test-time adaptation methods have focused on synthetic corruption shifts, leaving a variety of distribution shifts underexplored. In this paper, we focus on distribution shifts that evolve gradually over time, which are common in the wild but challenging for existing methods, as we show. To address this, we propose STAD, a Bayesian filtering method that adapts a deployed model to temporal distribution shifts by learning the time-varying dynamics in the last set of hidden features. Without requiring labels, our model infers time-evolving class prototypes that act as a dynamic classification head. Through experiments on real-world temporal distribution shifts, we…
Peer Reviews
Decision·Submitted to ICLR 2025
The paper is well-written and gives a principled justification for the proposed method. It considers real-world shifts and the proposed method seems to outperform the baselines generally.
The paper does not appear to have any major weaknesses, but I am curious about the importance of the subfield (temporal test-time adaptation). Why is this the right setting to study (as opposed to others for distribution shift) and how does it compare to methods involving periodic retraining and unsupervised domain adaptation?
1. The authors propose a realistic TTA setting and conduct extensive experiments across multiple datasets to validate the method's effectiveness. 2. A new TTA method, STAD, is introduced, combining state-space models with a solid theoretical foundation.
1. The method modifies and updates only the linear classifier, raising concerns about effectively handling covariate shifts. As shown in Table 4, adapting only the classifier on CIFAR-10C performs significantly worse than adapting the feature extractor. 2. Experimentally, while a realistic TTA setting is proposed, comparisons with other similar settings and related methods are lacking. For instance, the following related works are not adequately compared: 1. UniTTA: Unified Benchmark and V
This work conducts experiments on several real-world datasets, enhancing the practical applicability of TTA algorithms in real-world settings. By modeling gradual distribution shifts using linear state-space models, this work provides fresh insights in the field of TTA.
The paper lacks discussion and empirical comparisons with related work, particularly in the field of gradual unsupervised domain adaptation, which addresses very similar problems (e.g., [1, 2]). As a result, the contribution of this paper may be overstated. The gradual shift assumption represents a relatively simple form of distribution shift in dynamic environments, as its total variation is small. This simplicity limits the contribution of the work. The proposed method appears to combine exi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Control Systems and Identification · Advanced Control Systems Optimization
MethodsSparse Evolutionary Training · Focus
