TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation   Models

Rabin Adhikari; Safal Thapaliya; Manish Dhakal; Bishesh Khanal

arXiv:2410.05239·cs.CV·October 10, 2024

TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Rabin Adhikari, Safal Thapaliya, Manish Dhakal, Bishesh Khanal

PDF

Open Access 1 Repo

TL;DR

This paper introduces TuneVLSeg, a benchmarking framework for prompt tuning in vision-language segmentation models, evaluating various prompt strategies across diverse datasets, and highlighting the effectiveness of visual prompts over textual ones under domain shifts.

Contribution

It presents a comprehensive benchmark for prompt tuning in VLSMs, including multiple strategies and datasets, and provides insights into their performance under domain shifts.

Findings

01

Visual prompt tuning often outperforms textual prompt tuning.

02

Multimodal prompt tuning has comparable performance with fewer hyperparameters.

03

Prompt tuning effectiveness varies significantly across domain shifts.

Abstract

Vision-Language Models (VLMs) have shown impressive performance in vision tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt tuning techniques, including textual, visual, and multimodal prompting, offer efficient alternatives by leveraging learnable prompts. However, their application to Vision-Language Segmentation Models (VLSMs) and evaluation under significant domain shifts remain unexplored. This work presents an open-source benchmarking framework, TuneVLSeg, to integrate various unimodal and multimodal prompt tuning techniques into VLSMs, making prompt tuning usable for downstream segmentation datasets with any number of classes. TuneVLSeg includes $6$ prompt tuning strategies on various prompt depths used in $2$ VLSMs totaling of $8$ different combinations. We test various prompt tuning on $8$ diverse medical datasets, including $3$ radiology…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naamiinepal/tunevlseg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling