Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, and Qingming Huang

TL;DR
This paper introduces EARN, a novel weakly supervised REG model that enhances entity understanding and adaptive reconstruction to improve target grounding accuracy in complex scenes.
Contribution
EARN integrates entity enhancement, adaptive grounding, and collaborative reconstruction modules, offering a new approach to address ambiguities and context issues in weakly supervised REG.
Findings
EARN outperforms existing methods on five datasets.
EARN effectively distinguishes targets in cluttered scenes.
Qualitative results show improved handling of multiple same-category objects.
Abstract
Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by a language expression while lacking the correspondence between target and expression. Two main problems exist in weakly supervised REG. First, the lack of region-level annotations introduces ambiguities between proposals and queries. Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects. To address the above challenges, we design an entity-enhanced adaptive reconstruction network (EARN). Specifically, EARN includes three modules: entity enhancement, adaptive grounding, and collaborative reconstruction. In entity enhancement, we calculate semantic similarity as supervision to select the candidate proposals. Adaptive grounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
