TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
Rabin Adhikari, Safal Thapaliya, Manish Dhakal, Bishesh Khanal

TL;DR
This paper introduces TuneVLSeg, a benchmarking framework for prompt tuning in vision-language segmentation models, evaluating various prompt strategies across diverse datasets, and highlighting the effectiveness of visual prompts over textual ones under domain shifts.
Contribution
It presents a comprehensive benchmark for prompt tuning in VLSMs, including multiple strategies and datasets, and provides insights into their performance under domain shifts.
Findings
Visual prompt tuning often outperforms textual prompt tuning.
Multimodal prompt tuning has comparable performance with fewer hyperparameters.
Prompt tuning effectiveness varies significantly across domain shifts.
Abstract
Vision-Language Models (VLMs) have shown impressive performance in vision tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt tuning techniques, including textual, visual, and multimodal prompting, offer efficient alternatives by leveraging learnable prompts. However, their application to Vision-Language Segmentation Models (VLSMs) and evaluation under significant domain shifts remain unexplored. This work presents an open-source benchmarking framework, TuneVLSeg, to integrate various unimodal and multimodal prompt tuning techniques into VLSMs, making prompt tuning usable for downstream segmentation datasets with any number of classes. TuneVLSeg includes prompt tuning strategies on various prompt depths used in VLSMs totaling of different combinations. We test various prompt tuning on diverse medical datasets, including radiology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
