Understanding Task Transfer in Vision-Language Models

Bhuvan Sachdeva; Karan Uppal; Abhinav Java; Vineeth N. Balasubramanian

arXiv:2511.18787·cs.CV·April 10, 2026

Understanding Task Transfer in Vision-Language Models

Bhuvan Sachdeva, Karan Uppal, Abhinav Java, Vineeth N. Balasubramanian

PDF

TL;DR

This paper systematically studies how finetuning vision-language models on one perception task influences their performance on others, introducing metrics and analysis to understand transfer effects.

Contribution

It introduces the Perfection Gap Factor (PGF) and constructs a task transfer graph revealing relationships among perception tasks in VLMs.

Findings

01

Identifies patterns of positive and negative transfer among perception tasks.

02

Organizes tasks into personas based on transfer behavior.

03

Demonstrates PGF's utility in guiding data selection for training.

Abstract

Vision-Language Models (VLMs) perform well on multimodal benchmarks but lag behind humans and specialized models on visual perception tasks like depth estimation or object counting. Finetuning on one task can unpredictably affect performance on others, making task-specific finetuning challenging. In this paper, we address this challenge through a systematic study of task transferability. We examine how finetuning a VLM on one perception task affects its zero-shot performance on others. We introduce Perfection Gap Factor (PGF), a normalized metric that measures change in performance as a result of task transfer. We utilize PGF to compute Task Transferability, which captures both the breadth and the magnitude of transfer induced by a source task. Using three open-weight VLMs evaluated across 13 perception tasks, we construct a task transfer graph that reveals previously unobserved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.