Learning Diverse Features in Vision Transformers for Improved   Generalization

Armand Mihai Nicolicioiu; Andrei Liviu Nicolicioiu; Bogdan Alexe,; Damien Teney

arXiv:2308.16274·cs.CV·September 1, 2023

Learning Diverse Features in Vision Transformers for Improved Generalization

Armand Mihai Nicolicioiu, Andrei Liviu Nicolicioiu, Bogdan Alexe,, Damien Teney

PDF

Open Access 1 Repo

TL;DR

This paper investigates how vision transformers learn features, identifies the role of attention heads in capturing robust and spurious signals, and proposes methods to improve generalization by promoting feature diversity and pruning spurious heads.

Contribution

It introduces a technique to enhance feature diversity in ViTs by encouraging orthogonality of attention heads' gradients and demonstrates improved out-of-distribution performance.

Findings

01

Pruning spurious attention heads improves robustness.

02

Orthogonality of attention head gradients increases feature diversity.

03

Enhanced feature diversity leads to better OOD generalization.

Abstract

Deep learning models often rely only on a small set of features even when there is a rich set of predictive signals in the training data. This makes models brittle and sensitive to distribution shifts. In this work, we first examine vision transformers (ViTs) and find that they tend to extract robust and spurious features with distinct attention heads. As a result of this modularity, their performance under distribution shifts can be significantly improved at test time by pruning heads corresponding to spurious features, which we demonstrate using an "oracle selection" on validation data. Second, we propose a method to further enhance the diversity and complementarity of the learned features by encouraging orthogonality of the attention heads' input gradients. We observe improved out-of-distribution performance on diagnostic benchmarks (MNIST-CIFAR, Waterbirds) as a consequence of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

armandnm/diverse-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications

MethodsPruning