Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

Haoyi Sun; Xiaoxiao Wang; Ning Mao; Qian Wang; Lifu Mu; Wen Zheng; Tao Wei; Wei Chen

arXiv:2604.14629·cs.CV·April 17, 2026

Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

Haoyi Sun, Xiaoxiao Wang, Ning Mao, Qian Wang, Lifu Mu, Wen Zheng, Tao Wei, Wei Chen

PDF

1 Repo 1 Models

TL;DR

Switch-KD introduces a novel visual-switch distillation framework that unifies vision-language knowledge transfer within a shared text-probability space, enhancing multimodal model performance efficiently.

Contribution

The paper proposes Switch-KD, a new method for multimodal knowledge distillation that explicitly aligns visual and language modalities in a shared probabilistic space.

Findings

01

Distilled TinyLLaVA achieves 3.6 points average improvement across 10 benchmarks.

02

Switch-KD effectively transfers multimodal knowledge from a 3B teacher to a 0.5B student.

03

The method improves model performance without architectural modifications.

Abstract

Vision-Language Models (VLMs) have shown remarkable capabilities in joint vision-language understanding, but their large scale poses significant challenges for deployment in resource-constrained scenarios. Knowledge Distillation (KD) offers a viable way to improve model capabilities without increasing model size or data requirements, making deployment more efficient. However, applying KD to VLMs is challenged by modality-specific supervision: although multimodal knowledge in VLMs is fused within the language space, current methods supervise each modality separately without explicitly addressing multimodal alignment, leading to inconsistent multimodal knowledge transfer. To address this, we propose Switch-KD, a visual-switch distillation framework that unifies vision-language knowledge transfer within a shared text-probability space. Switch-KD comprises two key components: (1)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoyi199815/Switch-KD
github

Models

🤗
HaoyiSun/Switch-KD-Qwen2.5-CLIP-1.8B
model· 85 dl· ♡ 1
85 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.