OAMixer: Object-aware Mixing Layer for Vision Transformers
Hyunwoo Kang, Sangwoo Mo, Jinwoo Shin

TL;DR
OAMixer introduces an object-aware mixing layer for vision transformers that leverages object labels obtained without extra annotation to improve patch interaction, classification accuracy, and robustness across multiple visual tasks.
Contribution
It proposes a novel object-aware mixing layer (OAMixer) that uses unsupervised or weakly-supervised object labels to enhance patch interactions in vision models.
Findings
Improves classification accuracy across various patch-based models.
Enhances background robustness and object-centric representations.
Benefits multiple downstream visual recognition tasks.
Abstract
Patch-based models, e.g., Vision Transformers (ViTs) and Mixers, have shown impressive results on various visual recognition tasks, alternating classic convolutional networks. While the initial patch-based models (ViTs) treated all patches equally, recent studies reveal that incorporating inductive bias like spatiality benefits the representations. However, most prior works solely focused on the location of patches, overlooking the scene structure of images. Thus, we aim to further guide the interaction of patches using the object information. Specifically, we propose OAMixer (object-aware mixing layer), which calibrates the patch mixing layers of patch-based models based on the object labels. Here, we obtain the object labels in unsupervised or weakly-supervised manners, i.e., no additional human-annotating cost is necessary. Using the object labels, OAMixer computes a reweighting mask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
