Spatial Entropy as an Inductive Bias for Vision Transformers

Elia Peruzzo; Enver Sangineto; Yahui Liu; Marco De Nadai; Wei Bi,; Bruno Lepri; Nicu Sebe

arXiv:2206.04636·cs.CV·March 15, 2023

Spatial Entropy as an Inductive Bias for Vision Transformers

Elia Peruzzo, Enver Sangineto, Yahui Liu, Marco De Nadai, Wei Bi,, Bruno Lepri, Nicu Sebe

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel regularization method for Vision Transformers that uses spatial entropy as an auxiliary self-supervised task, encouraging semantic segmentation structures and improving accuracy especially with limited training data.

Contribution

It proposes a new spatial entropy-based regularization technique that enhances Vision Transformers without altering their architecture, leveraging self-supervised learning to induce a local spatial bias.

Findings

01

Regularization improves Vision Transformer accuracy with small-medium datasets.

02

Method matches or exceeds performance of architecture-based local bias methods.

03

Spatial entropy regularization enhances semantic clustering in attention maps.

Abstract

Recent work on Vision Transformers (VTs) showed that introducing a local inductive bias in the VT architecture helps reducing the number of samples necessary for training. However, the architecture modifications lead to a loss of generality of the Transformer backbone, partially contradicting the push towards the development of uniform architectures, shared, e.g., by both the Computer Vision and the Natural Language Processing areas. In this work, we propose a different and complementary direction, in which a local bias is introduced using an auxiliary self-supervised task, performed jointly with standard supervised training. Specifically, we exploit the observation that the attention maps of VTs, when trained with self-supervision, can contain a semantic segmentation structure which does not spontaneously emerge when training is supervised. Thus, we explicitly encourage the emergence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

helia95/sar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrared Target Detection Methodologies · Remote-Sensing Image Classification · Image Processing Techniques and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Dense Connections · Absolute Position Encodings · Linear Layer · Label Smoothing · Dropout