Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting
Kai Ye, Xiaotong You, Jianghang Lin, Jiayi Ji, Pingyang Dai, Liujuan Cao

TL;DR
This paper introduces EVOL-SAM3, a zero-shot reasoning segmentation framework that uses evolutionary prompting and iterative refinement to improve pixel-level localization without training, outperforming existing methods.
Contribution
It proposes a novel inference-time evolutionary search approach with prompt hypotheses, a Visual Arena for fitness evaluation, and semantic mutation to enhance reasoning depth and accuracy.
Findings
Outperforms static baselines on ReasonSeg benchmark
Surpasses fully supervised state-of-the-art methods in zero-shot setting
Demonstrates robustness and improved reasoning through evolutionary prompt refinement
Abstract
Reasoning Segmentation requires models to interpret complex, context-dependent linguistic queries to achieve pixel-level localization. Current dominant approaches rely heavily on Supervised Fine-Tuning (SFT) or Reinforcement Learning (RL). However, SFT suffers from catastrophic forgetting and domain dependency, while RL is often hindered by training instability and rigid reliance on predefined reward functions. Although recent training-free methods circumvent these training burdens, they are fundamentally limited by a static inference paradigm. These methods typically rely on a single-pass "generate-then-segment" chain, which suffers from insufficient reasoning depth and lacks the capability to self-correct linguistic hallucinations or spatial misinterpretations. In this paper, we challenge these limitations and propose EVOL-SAM3, a novel zero-shot framework that reformulates reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
