Focusing On Targets For Improving Weakly Supervised Visual Grounding
Viet-Quoc Pham, Nao Mishima

TL;DR
This paper introduces two methods to enhance weakly supervised visual grounding by focusing on target-aware cropping and dependency parsing, significantly improving state-of-the-art performance on key datasets.
Contribution
It proposes simple, effective techniques that improve heatmap-based visual grounding by emphasizing target semantics and object-related words, surpassing previous methods.
Findings
Achieved higher accuracy on RefCOCO, RefCOCO+, and RefCOCOg datasets.
Enhanced heatmap quality by focusing on target-related regions.
Outperformed previous state-of-the-art methods by a notable margin.
Abstract
Weakly supervised visual grounding aims to predict the region in an image that corresponds to a specific linguistic query, where the mapping between the target object and query is unknown in the training stage. The state-of-the-art method uses a vision language pre-training model to acquire heatmaps from Grad-CAM, which matches every query word with an image region, and uses the combined heatmap to rank the region proposals. In this paper, we propose two simple but efficient methods for improving this approach. First, we propose a target-aware cropping approach to encourage the model to learn both object and scene level semantic representations. Second, we apply dependency parsing to extract words related to the target object, and then put emphasis on these words in the heatmap combination. Our method surpasses the previous SOTA methods on RefCOCO, RefCOCO+, and RefCOCOg by a notable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Natural Language Processing Techniques
MethodsHeatmap
