Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
Chau Pham, Bryan A. Plummer

TL;DR
This paper introduces DiChaViT, a method to improve feature diversity in multi-channel imaging vision transformers by using a novel channel sampling strategy, regularization, and initialization, leading to significant performance gains.
Contribution
We propose DiChaViT, a novel approach that enhances feature diversity in MCI-ViT models through a unique channel sampling strategy and regularization techniques, applicable across architectures.
Findings
Achieves 1.5-5.0% performance improvement over state-of-the-art.
Effective on satellite and microscopy datasets.
Architecture-agnostic improvements.
Abstract
Multi-Channel Imaging (MCI) contains an array of challenges for encoding useful feature representations not present in traditional images. For example, images from two different satellites may both contain RGB channels, but the remaining channels can be different for each imaging source. Thus, MCI models must support a variety of channel configurations at test time. Recent work has extended traditional visual encoders for MCI, such as Vision Transformers (ViT), by supplementing pixel information with an encoding representing the channel configuration. However, these methods treat each channel equally, i.e., they do not consider the unique properties of each channel type, which can result in needless and potentially harmful redundancies in the learned features. For example, if RGB channels are always present, the other channels can focus on extracting information that cannot be captured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus
