LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization
Zhiwei Chen, Changan Wang, Yabiao Wang, Guannan Jiang, Yunhang Shen,, Ying Tai, Chengjie Wang, Wei Zhang, Liujuan Cao

TL;DR
This paper introduces LCTR, a transformer-based framework with novel modules to improve local feature perception in weakly supervised object localization, addressing CNNs' limitations in capturing object extent.
Contribution
The paper proposes LCTR, a transformer framework with relational patch-attention and cue digging modules to enhance local perception in WSOL.
Findings
LCTR outperforms existing methods on CUB-200-2011.
LCTR achieves higher localization accuracy on ILSVRC.
The modules effectively improve local feature highlighting.
Abstract
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels. The convolution neural network (CNN) based techniques often result in highlighting the most discriminative part of objects while ignoring the entire object extent. Recently, the transformer architecture has been deployed to WSOL to capture the long-range feature dependencies with self-attention mechanism and multilayer perceptron structure. Nevertheless, transformers lack the locality inductive bias inherent to CNNs and therefore may deteriorate local feature details in WSOL. In this paper, we propose a novel framework built upon the transformer, termed LCTR (Local Continuity TRansformer), which targets at enhancing the local perception capability of global features among long-range feature dependencies. To this end, we propose a relational patch-attention module (RPAM), which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsConvolution
