Skill-Evolving Grounded Reasoning for Free-Text Promptable 3D Medical Image Segmentation
Tongrui Zhang, Chenhui Wang, Yongming Li, Zhihao Chen, Xufeng Zhan, Hongming Shan

TL;DR
SEER introduces a reasoning-based framework for 3D medical image segmentation that enhances robustness to linguistic variability by aligning clinical language with anatomical evidence and evolving skills through self-refinement.
Contribution
The paper presents SEER, a novel reasoning-driven approach with a new dataset, enabling better alignment of free-text prompts with anatomical structures and improving robustness in medical image segmentation.
Findings
SEER reduces performance variance by 81.94% under linguistic perturbations.
SEER improves worst-case Dice score by 18.60%.
SEER outperforms state-of-the-art methods in robustness and accuracy.
Abstract
Free-text promptable 3D medical image segmentation offers an intuitive and clinically flexible interaction paradigm. However, current methods are highly sensitive to linguistic variability: minor changes in phrasing can cause substantial performance degradation despite identical clinical intent. Existing approaches attempt to improve robustness through stronger vision-language fusion or larger vocabularies, yet they lack mechanisms to consistently align ambiguous free-form expressions with anatomically grounded representations. We propose Skill-Evolving grounded Reasoning (SEER), a novel framework for free-text promptable 3D medical image segmentation that explicitly bridges linguistic variability and anatomical precision through a reasoning-driven design. First, we curate the SEER-Trace dataset, which pairs raw clinical requests with image-grounded, skill-tagged reasoning traces,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Artificial Intelligence in Healthcare and Education
