Temporal Test-Time Adaptation with State-Space Models

Mona Schirmer; Dan Zhang; Eric Nalisnick

arXiv:2407.12492·cs.LG·November 18, 2025

Temporal Test-Time Adaptation with State-Space Models

Mona Schirmer, Dan Zhang, Eric Nalisnick

PDF

Open Access 3 Reviews

TL;DR

This paper introduces STAD, a Bayesian filtering approach for test-time adaptation that effectively handles gradually evolving distribution shifts over time without requiring labels.

Contribution

It proposes a novel method that models temporal distribution shifts using state-space models, improving adaptation in real-world scenarios with evolving data.

Findings

01

Outperforms existing methods on real-world temporal shifts

02

Handles small batch sizes effectively

03

Manages label shift without labels

Abstract

Distribution shifts between training and test data are inevitable over the lifecycle of a deployed model, leading to performance decay. Adapting a model on test samples can help mitigate this drop in performance. However, most test-time adaptation methods have focused on synthetic corruption shifts, leaving a variety of distribution shifts underexplored. In this paper, we focus on distribution shifts that evolve gradually over time, which are common in the wild but challenging for existing methods, as we show. To address this, we propose STAD, a Bayesian filtering method that adapts a deployed model to temporal distribution shifts by learning the time-varying dynamics in the last set of hidden features. Without requiring labels, our model infers time-evolving class prototypes that act as a dynamic classification head. Through experiments on real-world temporal distribution shifts, we…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

The paper is well-written and gives a principled justification for the proposed method. It considers real-world shifts and the proposed method seems to outperform the baselines generally.

Weaknesses

The paper does not appear to have any major weaknesses, but I am curious about the importance of the subfield (temporal test-time adaptation). Why is this the right setting to study (as opposed to others for distribution shift) and how does it compare to methods involving periodic retraining and unsupervised domain adaptation?

Reviewer 02Rating 3Confidence 4

Strengths

1. The authors propose a realistic TTA setting and conduct extensive experiments across multiple datasets to validate the method's effectiveness. 2. A new TTA method, STAD, is introduced, combining state-space models with a solid theoretical foundation.

Weaknesses

1. The method modifies and updates only the linear classifier, raising concerns about effectively handling covariate shifts. As shown in Table 4, adapting only the classifier on CIFAR-10C performs significantly worse than adapting the feature extractor. 2. Experimentally, while a realistic TTA setting is proposed, comparisons with other similar settings and related methods are lacking. For instance, the following related works are not adequately compared: 1. UniTTA: Unified Benchmark and V

Reviewer 03Rating 3Confidence 4

Strengths

This work conducts experiments on several real-world datasets, enhancing the practical applicability of TTA algorithms in real-world settings. By modeling gradual distribution shifts using linear state-space models, this work provides fresh insights in the field of TTA.

Weaknesses

The paper lacks discussion and empirical comparisons with related work, particularly in the field of gradual unsupervised domain adaptation, which addresses very similar problems (e.g., [1, 2]). As a result, the contribution of this paper may be overstated. The gradual shift assumption represents a relatively simple form of distribution shift in dynamic environments, as its total variation is small. This simplicity limits the contribution of the work. The proposed method appears to combine exi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Control Systems and Identification · Advanced Control Systems Optimization

MethodsSparse Evolutionary Training · Focus