Progressive Conditioned Scale-Shift Recalibration of Self-Attention for Online Test-time Adaptation
Yushun Tang, Ziqiong Liu, Jiyuan Jia, Yi Zhang, Zhihai He

TL;DR
This paper introduces a progressive self-attention recalibration method for transformer models to improve online test-time domain adaptation, significantly enhancing classification accuracy on benchmark datasets.
Contribution
It proposes a novel progressive scale-shift recalibration approach using domain separation and factor generation networks for online adaptation.
Findings
Improves classification accuracy by up to 3.9% on ImageNet-C.
Effectively separates domain shift features during inference.
Lightweight online adaptation networks enhance transformer robustness.
Abstract
Online test-time adaptation aims to dynamically adjust a network model in real-time based on sequential input samples during the inference stage. In this work, we find that, when applying a transformer network model to a new target domain, the Query, Key, and Value features of its self-attention module often change significantly from those in the source domain, leading to substantial performance degradation of the transformer model. To address this important issue, we propose to develop a new approach to progressively recalibrate the self-attention at each layer using a local linear transform parameterized by conditioned scale and shift factors. We consider the online model adaptation from the source domain to the target domain as a progressive domain shift separation process. At each transformer network layer, we learn a Domain Separation Network to extract the domain shift feature,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
