Re-Attention Transformer for Weakly Supervised Object Localization
Hui Su, Yue Ye, Zhiwei Chen, Mingli Song, Lechao Cheng

TL;DR
This paper introduces a re-attention transformer with a token refinement mechanism that improves weakly supervised object localization by better focusing on full objects and suppressing background noise.
Contribution
It proposes a novel token refinement transformer with a token priority scoring module and class activation map integration to enhance object localization accuracy.
Findings
Outperforms existing weakly supervised localization methods on benchmarks.
Effectively suppresses background noise while highlighting target objects.
Demonstrates the effectiveness of re-attention mechanisms in transformer-based localization.
Abstract
Weakly supervised object localization is a challenging task which aims to localize objects with coarse annotations such as image categories. Existing deep network approaches are mainly based on class activation map, which focuses on highlighting discriminative local region while ignoring the full object. In addition, the emerging transformer-based techniques constantly put a lot of emphasis on the backdrop that impedes the ability to identify complete objects. To address these issues, we present a re-attention mechanism termed token refinement transformer (TRT) that captures the object-level semantics to guide the localization well. Specifically, TRT introduces a novel module named token priority scoring module (TPSM) to suppress the effects of background noise while focusing on the target object. Then, we incorporate the class activation map as the semantically aware input to restrain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttentive Walk-Aggregating Graph Neural Network
