StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods
Zheng Li, Jerry Cheng, Huanying Helen Gu

TL;DR
StableTTA introduces two training-free test-time adaptation methods that enhance vision model accuracy and consistency by leveraging semantic coherence and aggregation stability, with minimal computational costs.
Contribution
It proposes StableTTA, a novel training-free TTA framework with two variants, addressing efficiency and stability issues in ensemble-based methods for vision tasks.
Findings
StableTTA-I improves accuracy and consistency in coherent-batch inference.
StableTTA-II offers lightweight, architecture-agnostic accuracy gains.
Experiments on ImageNet-1K with 71 models validate effectiveness.
Abstract
Ensemble methods improve predictive performance but often incur high memory and computational costs. We identify an aggregation instability induced by nonlinear projection and voting operations. To address both efficiency challenges and this inconsistency, we propose StableTTA, a training-free test-time adaptation method with two variants. StableTTA-I targets coherent-batch inference settings, where temporally or semantically adjacent observations are likely to belong to the same class. Examples include burst photography, video streams, robotics perception, and industrial inspection. Under coherent-batch inference, StableTTA-I substantially improves prediction consistency and accuracy through variance-aware logit aggregation. StableTTA-II establishes feature-level cropping, enabling efficient logit aggregation with a single forward pass on a single model backbone. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
