Beyond Single-Sample: Reliable Multi-Sample Distillation for Video Understanding
Songlin Li, Xin Zhu, Zechao Guan, Peipeng Chen, Jian Yao

TL;DR
This paper introduces R-MSD, a multi-sample distillation framework for video understanding that reduces variance and improves knowledge transfer by leveraging a teacher pool and quality-aware filtering, outperforming single-sample methods.
Contribution
The paper presents R-MSD, a novel multi-sample distillation approach that models teacher sampling variance and uses a task-adaptive teacher pool for more reliable supervision in video understanding.
Findings
R-MSD outperforms single sample distillation across multiple benchmarks.
Significant improvements on VideoMME, Video-MMMU, and MathVerse datasets.
Marginal gains from baseline SFT+RL 4B model under same training budget.
Abstract
Traditional black-box distillation for Large Vision-Language Models (LVLMs) typically relies on a single teacher response per input, which often yields high-variance responses and format inconsistencies in multimodal or temporal scenarios. To mitigate this unreliable supervision, we propose R-MSD (Reliable Multi-Sample Distillation), a framework that explicitly models teacher sampling variance to enhance distillation stability. Rather than relying on a single teacher response, our approach leverages a task-adaptive teacher pool to provide robust supervision tailored to both closed-ended and open-ended reasoning. By integrating quality-aware signal matching with an adversarial distillation objective, our approach effectively filters teacher noise while maximizing knowledge transfer. Extensive evaluations across comprehensive video understanding benchmarks demonstrate that R-MSD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
