OAMixer: Object-aware Mixing Layer for Vision Transformers

Hyunwoo Kang; Sangwoo Mo; Jinwoo Shin

arXiv:2212.06595·cs.CV·December 14, 2022·1 cites

OAMixer: Object-aware Mixing Layer for Vision Transformers

Hyunwoo Kang, Sangwoo Mo, Jinwoo Shin

PDF

Open Access 2 Repos

TL;DR

OAMixer introduces an object-aware mixing layer for vision transformers that leverages object labels obtained without extra annotation to improve patch interaction, classification accuracy, and robustness across multiple visual tasks.

Contribution

It proposes a novel object-aware mixing layer (OAMixer) that uses unsupervised or weakly-supervised object labels to enhance patch interactions in vision models.

Findings

01

Improves classification accuracy across various patch-based models.

02

Enhances background robustness and object-centric representations.

03

Benefits multiple downstream visual recognition tasks.

Abstract

Patch-based models, e.g., Vision Transformers (ViTs) and Mixers, have shown impressive results on various visual recognition tasks, alternating classic convolutional networks. While the initial patch-based models (ViTs) treated all patches equally, recent studies reveal that incorporating inductive bias like spatiality benefits the representations. However, most prior works solely focused on the location of patches, overlooking the scene structure of images. Thus, we aim to further guide the interaction of patches using the object information. Specifically, we propose OAMixer (object-aware mixing layer), which calibrates the patch mixing layers of patch-based models based on the object labels. Here, we obtain the object labels in unsupervised or weakly-supervised manners, i.e., no additional human-annotating cost is necessary. Using the object labels, OAMixer computes a reweighting mask…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI