TL;DR
This paper introduces a novel weakly supervised object localization method that leverages contrastive attention and foreground consistency losses, improving localization accuracy by effectively utilizing background cues and enhancing attention maps.
Contribution
It proposes contrastive attention loss and foreground consistency loss, along with non-local attention blocks, to improve object localization accuracy in weakly supervised settings.
Findings
Achieves state-of-the-art results on CUB-200-2011 and ImageNet datasets.
Effectively utilizes background cues to guide feature activation.
Enhances attention maps with non-local attention blocks.
Abstract
Weakly supervised object localization (WSOL) aims to localize the target object using only the image-level supervision. Recent methods encourage the model to activate feature maps over the entire object by dropping the most discriminative parts. However, they are likely to induce excessive extension to the backgrounds which leads to over-estimated localization. In this paper, we consider the background as an important cue that guides the feature activation to cover the sophisticated object region and propose contrastive attention loss. The loss promotes similarity between foreground and its dropped version, and, dissimilarity between the dropped version and background. Furthermore, we propose foreground consistency loss that penalizes earlier layers producing noisy attention regarding the later layer as a reference to provide them with a sense of backgroundness. It guides the early…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
