Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting

Kai Ye; Xiaotong You; Jianghang Lin; Jiayi Ji; Pingyang Dai; Liujuan Cao

arXiv:2512.24702·cs.CV·January 1, 2026

Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting

Kai Ye, Xiaotong You, Jianghang Lin, Jiayi Ji, Pingyang Dai, Liujuan Cao

PDF

Open Access

TL;DR

This paper introduces EVOL-SAM3, a zero-shot reasoning segmentation framework that uses evolutionary prompting and iterative refinement to improve pixel-level localization without training, outperforming existing methods.

Contribution

It proposes a novel inference-time evolutionary search approach with prompt hypotheses, a Visual Arena for fitness evaluation, and semantic mutation to enhance reasoning depth and accuracy.

Findings

01

Outperforms static baselines on ReasonSeg benchmark

02

Surpasses fully supervised state-of-the-art methods in zero-shot setting

03

Demonstrates robustness and improved reasoning through evolutionary prompt refinement

Abstract

Reasoning Segmentation requires models to interpret complex, context-dependent linguistic queries to achieve pixel-level localization. Current dominant approaches rely heavily on Supervised Fine-Tuning (SFT) or Reinforcement Learning (RL). However, SFT suffers from catastrophic forgetting and domain dependency, while RL is often hindered by training instability and rigid reliance on predefined reward functions. Although recent training-free methods circumvent these training burdens, they are fundamentally limited by a static inference paradigm. These methods typically rely on a single-pass "generate-then-segment" chain, which suffers from insufficient reasoning depth and lacks the capability to self-correct linguistic hallucinations or spatial misinterpretations. In this paper, we challenge these limitations and propose EVOL-SAM3, a novel zero-shot framework that reformulates reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling