Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment
Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa

TL;DR
This paper introduces class-conditional feature alignment (CFA), a novel test-time adaptation method for Vision Transformers that improves robustness to corruptions and domain shifts without retraining from scratch.
Contribution
The paper proposes CFA, a new online test-time adaptation technique for ViT that minimizes distribution differences in hidden features, outperforming existing methods across various datasets.
Findings
CFA outperforms baselines on corruption and domain shift benchmarks.
CFA is effective across different model architectures including ResNet and ViT variants.
Achieves 19.8% top-1 error on ImageNet-C, surpassing previous TTA methods.
Abstract
Vision Transformer (ViT) is becoming more popular in image processing. Specifically, we investigate the effectiveness of test-time adaptation (TTA) on ViT, a technique that has emerged to correct its prediction during test-time by itself. First, we benchmark various test-time adaptation approaches on ViT-B16 and ViT-L16. It is shown that the TTA is effective on ViT and the prior-convention (sensibly selecting modulation parameters) is not necessary when using proper loss function. Based on the observation, we propose a new test-time adaptation method called class-conditional feature alignment (CFA), which minimizes both the class-conditional distribution differences and the whole distribution differences of the hidden representation between the source and target in an online manner. Experiments of image classification tasks on common corruption (CIFAR-10-C, CIFAR-100-C, and ImageNet-C)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Industrial Vision Systems and Defect Detection · Advanced Neural Network Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Feedforward Network · Multi-Head Attention · Attention Dropout · Label Smoothing · Dropout · Byte Pair Encoding
