Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers

Chau Pham; Bryan A. Plummer

arXiv:2405.16419·cs.CV·October 29, 2024

Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers

Chau Pham, Bryan A. Plummer

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DiChaViT, a method to improve feature diversity in multi-channel imaging vision transformers by using a novel channel sampling strategy, regularization, and initialization, leading to significant performance gains.

Contribution

We propose DiChaViT, a novel approach that enhances feature diversity in MCI-ViT models through a unique channel sampling strategy and regularization techniques, applicable across architectures.

Findings

01

Achieves 1.5-5.0% performance improvement over state-of-the-art.

02

Effective on satellite and microscopy datasets.

03

Architecture-agnostic improvements.

Abstract

Multi-Channel Imaging (MCI) contains an array of challenges for encoding useful feature representations not present in traditional images. For example, images from two different satellites may both contain RGB channels, but the remaining channels can be different for each imaging source. Thus, MCI models must support a variety of channel configurations at test time. Recent work has extended traditional visual encoders for MCI, such as Vision Transformers (ViT), by supplementing pixel information with an encoding representing the channel configuration. However, these methods treat each channel equally, i.e., they do not consider the unique properties of each channel type, which can result in needless and potentially harmful redundancies in the learned features. For example, if RGB channels are always present, the other channels can focus on extracting information that cannot be captured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chaudatascience/diverse_channel_vit
pytorchOfficial

Videos

Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsFocus