TL;DR
This paper introduces IMSE, a spectral expert-based fine-tuning method for test-time adaptation that leverages SVD decomposition in Vision Transformers, addressing feature collapse and domain shift challenges.
Contribution
IMSE uniquely adapts only the singular values of linear layers in Vision Transformers, incorporating diversity maximization and domain-aware retrieval for efficient, state-of-the-art test-time adaptation.
Findings
Achieves state-of-the-art performance on distribution-shift benchmarks.
Improves accuracy by 3.4 and 2.4 percentage points in CTTA scenarios.
Requires 385 times fewer trainable parameters than previous methods.
Abstract
Test-time adaptation (TTA) has been widely explored to prevent performance degradation when test data differ from the training distribution. However, fully leveraging the rich representations of large pretrained models with minimal parameter updates remains underexplored. In this paper, we propose Intrinsic Mixture of Spectral Experts (IMSE) that leverages the spectral experts inherently embedded in Vision Transformers. We decompose each linear layer via singular value decomposition (SVD) and adapt only the singular values, while keeping the singular vectors fixed. We further identify a key limitation of entropy minimization in TTA: it often induces feature collapse, causing the model to rely on domain-specific features rather than class-discriminative features. To address this, we propose a diversity maximization loss based on expert-input alignment, which encourages diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
