When Better Teachers Don't Make Better Students: Revisiting Knowledge Distillation for CLIP Models in VQA
Pume Tuchinda, Parinthapat Pengpun, Romrawin Chumpu, Sarana Nutanong, Peerat Limkonchotiwat

TL;DR
This paper investigates the effectiveness of knowledge distillation for CLIP-style vision-language models in VQA, revealing that stronger teachers do not always produce better students and highlighting challenges in scaling existing frameworks.
Contribution
It provides the first systematic analysis of KD across various CLIP models for VQA, challenging assumptions about teacher strength and model scaling.
Findings
Stronger teachers do not always improve student performance.
Existing distillation methods often degrade performance when scaled.
Challenges in applying KD to large-scale multimodal models are identified.
Abstract
Vision-language models (VLMs) have achieved remarkable success across multimodal tasks, yet their substantial computational demands hinder efficient deployment. Knowledge distillation (KD) has emerged as a powerful approach for building lightweight but competitive models, with strong evidence from both language and vision domains. However, its application to VLMs, particularly CLIP-style models, remains limited, often constrained to small-scale teachers and narrow evaluation tasks such as classification or retrieval. In this work, we present the first systematic study of distillation across a range of CLIP-style teacher models, ranging from standard baselines to large-scale state-of-the-art models. Contrary to trends observed in NLP and vision, we find that stronger teachers do not consistently yield better students; in fact, existing distillation frameworks often fail to scale, leading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Neural Network Applications
