ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
Yizhi Hu, Zezhao Tian, Xingqun Qi, Chen Su, Bingkun Yang, Junhui Yin, Muyi Sun, Man Zhang, Zhenan Sun

TL;DR
ReMeREC is a novel framework for multi-entity referring expression comprehension that models inter-entity relations and uses a new dataset and modules to improve localization accuracy in complex scenes.
Contribution
It introduces ReMeX, a relation-aware multi-entity REC dataset, and proposes ReMeREC with TMP and EIR modules for enhanced multi-entity and relational understanding.
Findings
Achieves state-of-the-art results on four benchmarks.
Outperforms existing methods significantly in multi-entity grounding.
Effectively models inter-entity relations and language ambiguity.
Abstract
Referring Expression Comprehension (REC) aims to localize specified entities or regions in an image based on natural language descriptions. While existing methods handle single-entity localization, they often ignore complex inter-entity relationships in multi-entity scenes, limiting their accuracy and reliability. Additionally, the lack of high-quality datasets with fine-grained, paired image-text-relation annotations hinders further progress. To address this challenge, we first construct a relation-aware, multi-entity REC dataset called ReMeX, which includes detailed relationship and textual annotations. We then propose ReMeREC, a novel framework that jointly leverages visual and textual cues to localize multiple entities while modeling their inter-relations. To address the semantic ambiguity caused by implicit entity boundaries in language, we introduce the Text-adaptive Multi-entity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
