Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
Yaxian Wang, Henghui Ding, Shuting He, Xudong Jiang, Bifan Wei, Jun, Liu

TL;DR
This paper introduces HieA2G, a novel hierarchical alignment and adaptive grounding network that significantly improves generalized referring expression comprehension by handling complex, multi-target, and no-target cases with state-of-the-art performance.
Contribution
The paper proposes a Hierarchical Multi-modal Semantic Alignment module and an Adaptive Grounding Counter, enabling flexible, comprehensive understanding and dynamic target counting in GREC.
Findings
Achieves state-of-the-art results on GREC and related tasks.
Enhances multi-modal understanding through hierarchical alignment.
Effectively handles varying target numbers with adaptive grounding.
Abstract
In this work, we address the challenging task of Generalized Referring Expression Comprehension (GREC). Compared to the classic Referring Expression Comprehension (REC) that focuses on single-target expressions, GREC extends the scope to a more practical setting by further encompassing no-target and multi-target expressions. Existing REC methods face challenges in handling the complex cases encountered in GREC, primarily due to their fixed output and limitations in multi-modal representations. To address these issues, we propose a Hierarchical Alignment-enhanced Adaptive Grounding Network (HieA2G) for GREC, which can flexibly deal with various types of referring expressions. First, a Hierarchical Multi-modal Semantic Alignment (HMSA) module is proposed to incorporate three levels of alignments, including word-object, phrase-object, and text-image alignment. It enables hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
