Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models

Jingchen Sun; Shaobo Han; Deep Patel; Wataru Kohno; Can Jin; and Changyou Chen

arXiv:2603.21426·cs.CV·March 24, 2026

Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models

Jingchen Sun, Shaobo Han, Deep Patel, Wataru Kohno, Can Jin, and Changyou Chen

PDF

Open Access

TL;DR

This paper introduces Beta-KD, an uncertainty-aware knowledge distillation framework for multimodal large language models that adaptively balances teacher guidance and data supervision, leading to improved model performance.

Contribution

The paper proposes a Bayesian formulation for uncertainty-aware distillation, providing a closed-form weighting mechanism that enhances knowledge transfer in multimodal models.

Findings

01

Beta-KD outperforms existing distillation methods on VQA benchmarks.

02

The framework adaptively balances teacher guidance based on uncertainty.

03

Experimental results show consistent performance improvements.

Abstract

Knowledge distillation establishes a learning paradigm that leverages both data supervision and teacher guidance. However, determining the optimal balance between learning from data and learning from the teacher is challenging, as some samples may be noisy while others are subject to teacher uncertainty. This motivates the need for adaptively balancing data and teacher supervision. We propose Beta-weighted Knowledge Distillation (Beta-KD), an uncertainty-aware distillation framework that adaptively modulates how much the student relies on teacher guidance. Specifically, we formulate teacher--student learning from a unified Bayesian perspective and interpret teacher supervision as a Gibbs prior over student activations. This yields a closed-form, uncertainty-aware weighting mechanism and supports arbitrary distillation objectives and their combinations. Extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Domain Adaptation and Few-Shot Learning