Teaching Matters: Investigating the Role of Supervision in Vision   Transformers

Matthew Walmer; Saksham Suri; Kamal Gupta; Abhinav Shrivastava

arXiv:2212.03862·cs.CV·April 7, 2023·1 cites

Teaching Matters: Investigating the Role of Supervision in Vision Transformers

Matthew Walmer, Saksham Suri, Kamal Gupta, Abhinav Shrivastava

PDF

Open Access 1 Repo

TL;DR

This paper investigates how different supervision methods influence Vision Transformers' behaviors, revealing diverse learning patterns, the emergence of Offset Local Attention Heads, and the competitive performance of self-supervised approaches.

Contribution

It provides a comprehensive comparison of ViTs trained with various supervision paradigms and uncovers novel behaviors like Offset Local Attention Heads not previously documented.

Findings

01

ViTs learn diverse behaviors depending on training method.

02

Self-supervised methods can match or outperform supervised ones.

03

Offset Local Attention Heads are a consistent phenomenon across models.

Abstract

Vision Transformers (ViTs) have gained significant popularity in recent years and have proliferated into many applications. However, their behavior under different learning paradigms is not well explored. We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance. We also discover ViT behaviors that are consistent across supervision, including the emergence of Offset Local Attention Heads. These are self-attention heads that attend to a token adjacent to the current token with a fixed directional offset, a phenomenon that to the best of our knowledge has not been highlighted in any prior work. Our analysis shows that ViTs are highly flexible and learn to process local and global information in different orders depending on their training method. We find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mwalmer-umd/vit_analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning