Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised   Referring Expression Grounding

Xuejing Liu; Liang Li; Shuhui Wang; Zheng-Jun Zha; Zechao Li; Qi Tian; and Qingming Huang

arXiv:2207.08386·cs.CV·July 19, 2022

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, and Qingming Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces EARN, a novel weakly supervised REG model that enhances entity understanding and adaptive reconstruction to improve target grounding accuracy in complex scenes.

Contribution

EARN integrates entity enhancement, adaptive grounding, and collaborative reconstruction modules, offering a new approach to address ambiguities and context issues in weakly supervised REG.

Findings

01

EARN outperforms existing methods on five datasets.

02

EARN effectively distinguishes targets in cluttered scenes.

03

Qualitative results show improved handling of multiple same-category objects.

Abstract

Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by a language expression while lacking the correspondence between target and expression. Two main problems exist in weakly supervised REG. First, the lack of region-level annotations introduces ambiguities between proposals and queries. Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects. To address the above challenges, we design an entity-enhanced adaptive reconstruction network (EARN). Specifically, EARN includes three modules: entity enhancement, adaptive grounding, and collaborative reconstruction. In entity enhancement, we calculate semantic similarity as supervision to select the candidate proposals. Adaptive grounding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gingl/earn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques