LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu, Xiangtai Li, Haobo Yuan, Lu Qi, Yunhai Tong, Ming-Hsuan, Yang

TL;DR
This paper investigates effective knowledge distillation strategies for training small-scale Multimodal Large Language Models (MLLMs), demonstrating that proper alignment techniques enable smaller models to match larger ones' performance.
Contribution
It provides the first comprehensive study on multimodal distillation, highlighting key training strategies and alignment methods that improve small MLLMs' performance.
Findings
Joint token and logit alignment are crucial for effective distillation.
A 2.7B model can match larger models' performance with proper strategies.
The study offers practical guidelines for training small-scale MLLMs.
Abstract
The recent surge in Multimodal Large Language Models (MLLMs) has showcased their remarkable potential for achieving generalized intelligence by integrating visual understanding into Large Language Models.Nevertheless, the sheer model size of MLLMs leads to substantial memory and computational demands that hinder their widespread deployment. In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch. Instead, we focus on what matters for training small-scale MLLMs through knowledge distillation, which is the first step from the multimodal distillation perspective. Our extensive studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process. These results show that joint alignment for both tokens and logit alignment plays critical roles in teacher-student frameworks. In addition, we draw a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsFocus · Knowledge Distillation
