Adaptive Reconstruction Network for Weakly Supervised Referring   Expression Grounding

Xuejing Liu; Liang Li; Shuhui Wang; Zheng-Jun Zha; Dechao Meng; and; Qingming Huang

arXiv:1908.10568·cs.CV·August 29, 2019·6 cites

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, and, Qingming Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an adaptive reconstruction network (ARN) for weakly supervised referring expression grounding, which improves localization accuracy by adaptively matching image proposals with linguistic queries and reconstructing the query to guide learning.

Contribution

The paper proposes a novel end-to-end ARN that adaptively builds correspondence between image proposals and queries, enhancing weakly supervised grounding performance.

Findings

01

ARN outperforms state-of-the-art methods on four datasets.

02

The adaptive mechanism reduces variance in referring expressions.

03

ARN handles multiple similar objects more effectively.

Abstract

Weakly supervised referring expression grounding aims at localizing the referential object in an image according to the linguistic query, where the mapping between the referential object and query is unknown in the training stage. To address this problem, we propose a novel end-to-end adaptive reconstruction network (ARN). It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction. Specifically, we first extract the subject, location and context features to represent the proposals and the query respectively. Then, we design the adaptive grounding module to compute the matching score between each proposal and query by a hierarchical attention model. Finally, based on attention score and proposal features, we reconstruct the input query with a collaborative loss of language reconstruction loss, adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GingL/ARN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Topic Modeling