Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models
Jingchen Sun, Shaobo Han, Deep Patel, Wataru Kohno, Can Jin, and Changyou Chen

TL;DR
This paper introduces Beta-KD, an uncertainty-aware knowledge distillation framework for multimodal large language models that adaptively balances teacher guidance and data supervision, leading to improved model performance.
Contribution
The paper proposes a Bayesian formulation for uncertainty-aware distillation, providing a closed-form weighting mechanism that enhances knowledge transfer in multimodal models.
Findings
Beta-KD outperforms existing distillation methods on VQA benchmarks.
The framework adaptively balances teacher guidance based on uncertainty.
Experimental results show consistent performance improvements.
Abstract
Knowledge distillation establishes a learning paradigm that leverages both data supervision and teacher guidance. However, determining the optimal balance between learning from data and learning from the teacher is challenging, as some samples may be noisy while others are subject to teacher uncertainty. This motivates the need for adaptively balancing data and teacher supervision. We propose Beta-weighted Knowledge Distillation (Beta-KD), an uncertainty-aware distillation framework that adaptively modulates how much the student relies on teacher guidance. Specifically, we formulate teacher--student learning from a unified Bayesian perspective and interpret teacher supervision as a Gibbs prior over student activations. This yields a closed-form, uncertainty-aware weighting mechanism and supports arbitrary distillation objectives and their combinations. Extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Domain Adaptation and Few-Shot Learning
