PPBoost: Progressive Prompt Boosting for Text-Driven Medical Image Segmentation
Xuchen Li, Hengrui Gu, Mohan Zhang, Qin Liu, Zhen Tan, Xinyuan Zhu, Huixue Zhou, Tianlong Chen, Kaixiong Zhou

TL;DR
PPBoost is a novel framework that transforms weak text prompts into strong, spatially precise visual prompts for medical image segmentation, improving accuracy without requiring labeled data.
Contribution
It introduces a zero-shot method that converts text signals into high-quality visual prompts, enhancing segmentation performance across diverse medical imaging modalities.
Findings
Consistently improves Dice and NSSD metrics over baselines
Outperforms few-shot segmentation models without labeled data
Generalizes to multiple segmentation model backbones
Abstract
Text-prompted foundation models for medical image segmentation offer an intuitive way to delineate anatomical structures from natural language queries, but their predictions often lack spatial precision and degrade under domain shift. In contrast, visual-prompted models achieve strong segmentation performance across diverse modalities by leveraging spatial cues of precise bounding-box (bbox) prompts to guide the segmentation of target lesions. However, it is costly and challenging to obtain the precise visual prompts in clinical practice. We propose PPBoost (Progressive Prompt-Boosting), a framework that bridges these limitations by transforming weak text-derived signals into strong, spatially grounded visual prompts, operating under a strict zero-shot regime with no image- or pixel-level segmentation labels. PPBoost first uses a vision-language model to produce initial pseudo-bboxes…
Peer Reviews
Decision·Submitted to ICLR 2026
The method is validated on three diverse datasets (brain tumors, liver tumors, kidney lesions), showing significant performance gains over both recent text-prompted and visual-prompted baselines. the paper is mostly well-written and organized. The technical details (e.g. filtering threshold, network architectures, training schedule) are provided, and an anonymous code link is included for reproducibility.
1) The proposed pipeline is quite complex, consisting of multiple stages (VLM-based proposal, pseudo-label filtering, semi-supervised detector training, and segmentation with a foundation model). It relies on several pre-trained models and heavy training procedures, which could make it resource-intensive and tricky to reproduce or deploy in practice. This complexity might place a lot of dependencies on the proper tuning of each stage. 2) While the integration of components is novel for this prob
1.The paper includes extensive ablation experiments that validate the effectiveness of the proposed PPBoost framework. 2.Beyond bypassing the need for costly manual prompts, the method offers a substantial boost in inference speed by replacing the computationally heavy VLM with a highly efficient trained detector.
1. Clarity on Baseline Comparisons: The exact configuration of the "Direct" baseline—specifically, whether it also incorporates the proposed uncertainty-aware filtering and bounding-box expansion—should be stated more explicitly to ensure a fair comparison. This clarification is critical, as it determines how much of the performance gain is attributable to the core contribution of the detector training stage. A more informative ablation would be to compare against a baseline that uses Filtering
1. From text-based confidence maps to uncertainty filtering and semi-supervised detection, then to selective expansion at inference, PPBoost establishes a reproducible “weak → strong” prompt conversion pipeline that reduces reliance on manual bounding boxes or points. 2. On three heterogeneous datasets, PPBoost achieves mean **mDSC +6.69%** and **mNSD +7.32%** improvements over text/visual-prompted baselines. 3. The authors have made the implementation code publicly available.
1. **Limited dataset coverage.** The main experiments are conducted only on three datasets (brain tumor MRI, liver CT, and synthetic kidney ultrasound). This limits robustness across anatomy types, institutions, and imaging protocols. The KidneySeg dataset is synthetic, so real-world generalization remains to be validated. 2. **Lack of comparison with related recent works.** 3. **Missing systematic robustness analysis.** Although the authors claim stability across modalities and anatomies, t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Advanced Neural Network Applications
