MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
Jiajun Cao, Yuan Zhang, Tao Huang, Ming Lu, Qizhe Zhang, Ruichuan An,, Ningning MA, Shanghang Zhang

TL;DR
MoVE-KD introduces a novel knowledge distillation framework that efficiently combines multiple visual encoders into a single model, leveraging input-dependent specialization and attention mechanisms to improve vision-language model performance.
Contribution
The paper proposes MoVE-KD, a new method that distills multiple visual encoders into one using low-rank adaptation, mixture-of-experts, and attention-based distillation strategies.
Findings
Effective in consolidating multiple encoders into a single model.
Improves performance on VLM benchmarks like LLaVA and LLaVA-NeXT.
Reduces computational cost while maintaining high accuracy.
Abstract
Visual encoders are fundamental components in vision-language models (VLMs), each showcasing unique strengths derived from various pre-trained visual foundation models. To leverage the various capabilities of these encoders, recent studies incorporate multiple encoders within a single VLM, leading to a considerable increase in computational cost. In this paper, we present Mixture-of-Visual-Encoder Knowledge Distillation (MoVE-KD), a novel framework that distills the unique proficiencies of multiple vision encoders into a single, efficient encoder model. Specifically, to mitigate conflicts and retain the unique characteristics of each teacher encoder, we employ low-rank adaptation (LoRA) and mixture-of-experts (MoEs) to selectively activate specialized knowledge based on input features, enhancing both adaptability and efficiency. To regularize the KD process and enhance performance, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing
MethodsKnowledge Distillation
