Seg-ReSearch: Segmentation with Interleaved Reasoning and External Search
Tianming Liang, Qirui Du, Jian-Fang Hu, Haichao Jiang, Zicheng Lin, Wei-Shi Zheng

TL;DR
Seg-ReSearch introduces a segmentation approach that combines reasoning with external search to handle dynamic, real-world queries beyond the fixed knowledge of large language models, improving performance on challenging benchmarks.
Contribution
It proposes a novel segmentation paradigm with interleaved reasoning and external search, overcoming the knowledge limitations of existing models.
Findings
Significant performance improvements on OK-VOS and other benchmarks.
Effective hierarchical reward design for training reasoning capabilities.
Demonstrated ability to handle open-world, domain-specific, and up-to-date information.
Abstract
Segmentation based on language has been a popular topic in computer vision. While recent advances in multimodal large language models (MLLMs) have endowed segmentation systems with reasoning capabilities, these efforts remain confined by the frozen internal knowledge of MLLMs, which limits their potential for real-world scenarios that involve up-to-date information or domain-specific concepts. In this work, we propose \textbf{Seg-ReSearch}, a novel segmentation paradigm that overcomes the knowledge bottleneck of existing approaches. By enabling interleaved reasoning and external search, Seg-ReSearch empowers segmentation systems to handle dynamic, open-world queries that extend beyond the frozen knowledge of MLLMs. To effectively train this capability, we introduce a hierarchical reward design that harmonizes initial guidance with progressive incentives, mitigating the dilemma between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
