Semantic-Constraint Matching Transformer for Weakly Supervised Object   Localization

Yiwen Cao; Yukun Su; Wenjun Wang; Yanxia Liu; Qingyao Wu

arXiv:2309.01331·cs.CV·September 6, 2023·1 cites

Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization

Yiwen Cao, Yukun Su, Wenjun Wang, Yanxia Liu, Qingyao Wu

PDF

Open Access

TL;DR

This paper introduces a Semantic-Constraint Matching Network using a transformer and a local patch shuffle strategy to improve weakly supervised object localization, achieving state-of-the-art results by addressing divergent activation issues.

Contribution

The paper proposes a novel transformer-based network with a semantic-constraint matching module and a local patch shuffle strategy to enhance object localization accuracy in weakly supervised settings.

Findings

01

Achieves new state-of-the-art performance on CUB-200-2011 and ILSVRC datasets.

02

Outperforms previous methods by a large margin.

03

Effectively mitigates divergent activation in transformer-based WSOL.

Abstract

Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision. Due to the local receptive fields generated by convolution operations, previous CNN-based methods suffer from partial activation issues, concentrating on the object's discriminative part instead of the entire entity scope. Benefiting from the capability of the self-attention mechanism to acquire long-range feature dependencies, Vision Transformer has been recently applied to alleviate the local activation drawbacks. However, since the transformer lacks the inductive localization bias that are inherent in CNNs, it may cause a divergent activation problem resulting in an uncertain distinction between foreground and background. In this work, we proposed a novel Semantic-Constraint Matching Network (SCMN) via a transformer to converge on the divergent activation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Byte Pair Encoding · Label Smoothing · Dropout · Absolute Position Encodings · Layer Normalization · Adam