Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
Kan Chen, Jiyang Gao, Ram Nevatia

TL;DR
This paper introduces KAC Net, a novel weakly supervised phrase grounding method that leverages visual-language consistency and external knowledge to improve localization accuracy.
Contribution
It proposes a new Knowledge Aided Consistency Network that incorporates external knowledge and visual-language cues for better weakly supervised grounding.
Findings
Significant performance improvement on benchmark datasets.
Effective use of external knowledge for grounding.
Enhanced focus on query-related proposals via KBP gate.
Abstract
Given a natural language query, a phrase grounding system aims to localize mentioned objects in an image. In weakly supervised scenario, mapping between image regions (i.e., proposals) and language is not available in the training set. Previous methods address this deficiency by training a grounding system via learning to reconstruct language information contained in input queries from predicted proposals. However, the optimization is solely guided by the reconstruction loss from the language modality, and ignores rich visual information contained in proposals and useful cues from external knowledge. In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding. We propose a novel Knowledge Aided Consistency Network (KAC Net) which is optimized by reconstructing input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
