Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

Kan Chen; Jiyang Gao; Ram Nevatia

arXiv:1803.03879·cs.CV·March 13, 2018

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

Kan Chen, Jiyang Gao, Ram Nevatia

PDF

TL;DR

This paper introduces KAC Net, a novel weakly supervised phrase grounding method that leverages visual-language consistency and external knowledge to improve localization accuracy.

Contribution

It proposes a new Knowledge Aided Consistency Network that incorporates external knowledge and visual-language cues for better weakly supervised grounding.

Findings

01

Significant performance improvement on benchmark datasets.

02

Effective use of external knowledge for grounding.

03

Enhanced focus on query-related proposals via KBP gate.

Abstract

Given a natural language query, a phrase grounding system aims to localize mentioned objects in an image. In weakly supervised scenario, mapping between image regions (i.e., proposals) and language is not available in the training set. Previous methods address this deficiency by training a grounding system via learning to reconstruct language information contained in input queries from predicted proposals. However, the optimization is solely guided by the reconstruction loss from the language modality, and ignores rich visual information contained in proposals and useful cues from external knowledge. In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding. We propose a novel Knowledge Aided Consistency Network (KAC Net) which is optimized by reconstructing input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.