HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
Yimu Wang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki

TL;DR
HAWAII introduces a hierarchical knowledge transfer framework that distills multiple visual experts into a single efficient vision encoder, improving vision-language task performance with minimal computational costs.
Contribution
The paper presents a novel hierarchical distillation method using teacher-specific LoRA adapters and a router to combine multiple visual experts into one efficient model.
Findings
Outperforms popular open-source VLMs on various tasks.
Reduces computational costs during training and inference.
Effectively integrates diverse visual knowledge into a single model.
Abstract
Improving the visual understanding ability of vision-language models (VLMs) is crucial for enhancing their performance across various tasks. While using multiple pretrained visual experts has shown great promise, it often incurs significant computational costs during training and inference. To address this challenge, we propose HAWAII, a novel framework that distills knowledge from multiple visual experts into a single vision encoder, enabling it to inherit the complementary strengths of several experts with minimal computational overhead. To mitigate conflicts among different teachers and switch between different teacher-specific knowledge, instead of using a fixed set of adapters for multiple teachers, we propose to use teacher-specific Low-Rank Adaptation (LoRA) adapters with a corresponding router. Each adapter is aligned with a specific teacher, avoiding noisy guidance during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
