LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension
Xianglong Shi, Silin Cheng, Sirui Zhao, Yunhan Jiang, Enhong Chen, Yang Liu, Sebastien Ourselin

TL;DR
The paper introduces LIHE, a novel framework for generalized weakly-supervised referring expression comprehension that handles variable referent numbers and addresses semantic collapse using hyperbolic-Euclidean hybrid similarity.
Contribution
LIHE is the first effective weakly-supervised framework for WGREC, utilizing a two-stage process and a hybrid similarity module to improve referent localization and semantic discrimination.
Findings
Establishes the first weakly-supervised WGREC baseline on gRefCOCO and Ref-ZOM.
HEMix improves [email protected] by up to 2.5% on standard REC benchmarks.
Effectively prevents semantic collapse while handling multiple referents.
Abstract
Existing Weakly-Supervised Referring Expression Comprehension (WREC) methods, while effective, are fundamentally limited by a one-to-one mapping assumption, hindering their ability to handle expressions corresponding to zero or multiple targets in realistic scenarios. To bridge this gap, we introduce the Weakly-Supervised Generalized Referring Expression Comprehension task (WGREC), a more practical paradigm that handles expressions with variable numbers of referents. However, extending WREC to WGREC presents two fundamental challenges: supervisory signal ambiguity, where weak image-level supervision is insufficient for training a model to infer the correct number and identity of referents, and semantic representation collapse, where standard Euclidean similarity forces hierarchically-related concepts into non-discriminative clusters, blurring categorical boundaries. To tackle these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
