Phrase-Instance Alignment for Generalized Referring Segmentation
E-Ro Nguyen, Hieu Le, Dimitris Samaras, Michael S. Ryoo

TL;DR
This paper introduces a novel approach to generalized referring segmentation by modeling phrase-instance alignment, enabling explicit grounding and improved performance on benchmark datasets.
Contribution
It reformulates GRES as an instance-level reasoning task with a phrase-object alignment loss, advancing interpretability and robustness.
Findings
Achieves 3.22% cIoU improvement on gRefCOCO
Attains 12.25% N-acc increase on Ref-ZOM
Enables explicit phrase-instance grounding
Abstract
Generalized Referring expressions can describe one object, several related objects, or none at all. Existing generalized referring segmentation (GRES) models treat all cases alike, predicting a single binary mask and ignoring how linguistic phrases correspond to distinct visual instances. To this end, we reformulate GRES as an instance-level reasoning problem, where the model first predicts multiple instance-aware object queries conditioned on the referring expression, then aligns each with its most relevant phrase. This alignment is enforced by a Phrase-Object Alignment (POA) loss that builds fine-grained correspondence between linguistic phrases and visual instances. Given these aligned object instance queries and their learned relevance scores, the final segmentation and the no-target case are both inferred through a unified relevance-weighted aggregation mechanism. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsSparse Evolutionary Training
