MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models

Yuqi Li; Junhao Dong; Chuanguang Yang; Shiping Wen; Piotr Koniusz; Tingwen Huang; Yingli Tian; Yew-Soon Ong

arXiv:2511.17448·cs.CV·November 24, 2025

MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models

Yuqi Li, Junhao Dong, Chuanguang Yang, Shiping Wen, Piotr Koniusz, Tingwen Huang, Yingli Tian, Yew-Soon Ong

PDF

Open Access

TL;DR

This paper introduces MMT-ARD, a novel multimodal multi-teacher adversarial distillation framework that enhances the robustness of vision-language models through collaborative knowledge fusion and adaptive weighting strategies.

Contribution

It proposes a dual-teacher architecture with dynamic and adaptive weighting mechanisms to improve robustness and training efficiency in vision-language models.

Findings

01

Improves robust accuracy by +4.32% on ImageNet

02

Achieves 2.3x faster training compared to single-teacher methods

03

Enhances zero-shot accuracy by +3.5% on benchmarks

Abstract

Vision-Language Models (VLMs) are increasingly deployed in safety-critical applications, making their adversarial robustness a crucial concern. While adversarial knowledge distillation has shown promise in transferring robustness from teacher to student models, traditional single-teacher approaches suffer from limited knowledge diversity, slow convergence, and difficulty in balancing robustness and accuracy. To address these challenges, we propose MMT-ARD: a Multimodal Multi-Teacher Adversarial Robust Distillation framework. Our key innovation is a dual-teacher knowledge fusion architecture that collaboratively optimizes clean feature preservation and robust feature enhancement. To better handle challenging adversarial examples, we introduce a dynamic weight allocation strategy based on teacher confidence, enabling adaptive focus on harder samples. Moreover, to mitigate bias among…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications