Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision
Aadarsh Sahoo, Georgia Gkioxari

TL;DR
This paper introduces a new framework for conversational image segmentation that incorporates abstract concepts, physical reasoning, and intent, supported by a novel benchmark and a scalable data generation method.
Contribution
It presents ConverSeg, a comprehensive benchmark, ConverSeg-Net, a new segmentation model, and an AI-powered data engine for scalable supervision in conversational image segmentation.
Findings
ConverSeg-Net outperforms existing models on the ConverSeg benchmark.
Current language-guided segmentation models are inadequate for complex conversational tasks.
The data engine enables scalable, supervised training without human annotation.
Abstract
Conversational image segmentation grounds abstract, intent-driven concepts into pixel-accurate masks. Prior work on referring image grounding focuses on categorical and spatial queries (e.g., "left-most apple") and overlooks functional and physical reasoning (e.g., "where can I safely store the knife?"). We address this gap and introduce Conversational Image Segmentation (CIS) and ConverSeg, a benchmark spanning entities, spatial relations, intent, affordances, functions, safety, and physical reasoning. We also present ConverSeg-Net, which fuses strong segmentation priors with language understanding, and an AI-powered data engine that generates prompt-mask pairs without human supervision. We show that current language-guided segmentation models are inadequate for CIS, while ConverSeg-Net trained on our data engine achieves significant gains on ConverSeg and maintains strong performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
