TL;DR
This paper introduces IC-Seg, a framework that resolves ambiguous referring segmentation queries through multi-turn clarification, improving accuracy and reducing redundant interactions.
Contribution
The paper proposes a novel agentic framework with hierarchical optimization for proactive clarification in referring segmentation tasks.
Findings
IC-Seg significantly outperforms existing methods on ambiguous query resolution.
The hierarchical optimization strategy enhances dialogue efficiency and segmentation accuracy.
IC-Seg maintains state-of-the-art performance on standard benchmarks.
Abstract
Referring segmentation aims to segment the target objects in images or videos based on the textual query. Despite remarkable progress over the past years, existing works always assume that the user-provided queries are already precise and clear. However, this assumption is impractical. In real-world scenarios, it is unrealistic to expect all users to thoroughly review their visual content and carefully ensure their queries are unique and unambiguous. When encountering such cases, existing segmentation models tend to arbitrarily guess the user preferences, often resulting in undesired outcomes. To address this limitation, we propose \textbf{IC-Seg}, a novel agentic framework that proactively clarifies user intent through multi-turn conversation before segmentation. To effectively incentivize this capability, we further introduce \textbf{Hi-GRPO}, a new hierarchical optimization strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
