Concept-to-Pixel: Prompt-Free Universal Medical Image Segmentation
Haoyun Chen, Fenghe Tang, Wenxin Ma, Shaohua Kevin Zhou

TL;DR
Concept-to-Pixel (C2P) introduces a prompt-free, universal medical image segmentation framework that disentangles anatomical knowledge into geometric and semantic tokens, enabling robust, cross-modal, and zero-shot segmentation across diverse datasets.
Contribution
C2P is the first prompt-free universal segmentation model that explicitly separates geometric and semantic information, improving robustness and generalization across multiple medical imaging modalities.
Findings
Outperforms existing approaches on eight diverse datasets.
Achieves strong zero-shot and cross-modal transfer performance.
Demonstrates robustness and accuracy in multi-modal medical image segmentation.
Abstract
Universal medical image segmentation seeks to use a single foundational model to handle diverse tasks across multiple imaging modalities. However, existing approaches often rely heavily on manual visual prompts or retrieved reference images, which limits their automation and robustness. In addition, naive joint training across modalities often fails to address large domain shifts. To address these limitations, we propose Concept-to-Pixel (C2P), a novel prompt-free universal segmentation framework. C2P explicitly separates anatomical knowledge into two components: Geometric and Semantic representations. It leverages Multimodal Large Language Models (MLLMs) to distill abstract, high-level medical concepts into learnable Semantic Tokens and introduces explicitly supervised Geometric Tokens to enforce universal physical and structural constraints. These disentangled tokens interact deeply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
