Understanding Task Transfer in Vision-Language Models
Bhuvan Sachdeva, Karan Uppal, Abhinav Java, Vineeth N. Balasubramanian

TL;DR
This paper systematically studies how finetuning vision-language models on one perception task influences their performance on others, introducing metrics and analysis to understand transfer effects.
Contribution
It introduces the Perfection Gap Factor (PGF) and constructs a task transfer graph revealing relationships among perception tasks in VLMs.
Findings
Identifies patterns of positive and negative transfer among perception tasks.
Organizes tasks into personas based on transfer behavior.
Demonstrates PGF's utility in guiding data selection for training.
Abstract
Vision-Language Models (VLMs) perform well on multimodal benchmarks but lag behind humans and specialized models on visual perception tasks like depth estimation or object counting. Finetuning on one task can unpredictably affect performance on others, making task-specific finetuning challenging. In this paper, we address this challenge through a systematic study of task transferability. We examine how finetuning a VLM on one perception task affects its zero-shot performance on others. We introduce Perfection Gap Factor (PGF), a normalized metric that measures change in performance as a result of task transfer. We utilize PGF to compute Task Transferability, which captures both the breadth and the magnitude of transfer induced by a source task. Using three open-weight VLMs evaluated across 13 perception tasks, we construct a task transfer graph that reveals previously unobserved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
