Robustifying Vision Transformer without Retraining from Scratch by   Test-Time Class-Conditional Feature Alignment

Takeshi Kojima; Yutaka Matsuo; Yusuke Iwasawa

arXiv:2206.13951·cs.CV·June 29, 2022·1 cites

Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment

Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa

PDF

Open Access 1 Repo

TL;DR

This paper introduces class-conditional feature alignment (CFA), a novel test-time adaptation method for Vision Transformers that improves robustness to corruptions and domain shifts without retraining from scratch.

Contribution

The paper proposes CFA, a new online test-time adaptation technique for ViT that minimizes distribution differences in hidden features, outperforming existing methods across various datasets.

Findings

01

CFA outperforms baselines on corruption and domain shift benchmarks.

02

CFA is effective across different model architectures including ResNet and ViT variants.

03

Achieves 19.8% top-1 error on ImageNet-C, surpassing previous TTA methods.

Abstract

Vision Transformer (ViT) is becoming more popular in image processing. Specifically, we investigate the effectiveness of test-time adaptation (TTA) on ViT, a technique that has emerged to correct its prediction during test-time by itself. First, we benchmark various test-time adaptation approaches on ViT-B16 and ViT-L16. It is shown that the TTA is effective on ViT and the prior-convention (sensibly selecting modulation parameters) is not necessary when using proper loss function. Based on the observation, we propose a new test-time adaptation method called class-conditional feature alignment (CFA), which minimizes both the class-conditional distribution differences and the whole distribution differences of the hidden representation between the source and target in an online manner. Experiments of image classification tasks on common corruption (CIFAR-10-C, CIFAR-100-C, and ImageNet-C)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kojima-takeshi188/cfa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Industrial Vision Systems and Defect Detection · Advanced Neural Network Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Feedforward Network · Multi-Head Attention · Attention Dropout · Label Smoothing · Dropout · Byte Pair Encoding