AME: Aligned Manifold Entropy for Robust Vision-Language Distillation

Guiming Cao; Yuming Ou

arXiv:2508.08644·cs.CV·August 13, 2025

AME: Aligned Manifold Entropy for Robust Vision-Language Distillation

Guiming Cao, Yuming Ou

PDF

TL;DR

This paper introduces AME, a novel entropy-based method for robust vision-language knowledge distillation that performs well even with limited data, without modifying model architecture.

Contribution

It proposes a plug-and-play entropy minimization technique over a shared manifold to improve generalization in vision-language distillation under low-data conditions.

Findings

01

Consistently improves distillation robustness across architectures

02

Achieves better generalization on downstream tasks

03

Theoretically tightens generalization error bounds

Abstract

Knowledge distillation is a long-established technique for knowledge transfer, and has regained attention in the context of the recent emergence of large vision-language models (VLMs). However, vision-language knowledge distillation often requires sufficient training data to achieve robust generalization on amples with ambiguous or boundary-adjacent representations, which are associated with high predictive uncertainty. Critically, collecting such large-scale, task-specific data for training is often impractical in real-world scenarios. To address this major challenge arising from the entanglement of uncertainty and cross-modal feature representation, we propose Aligned Manifold Entropy for Robust Vision-Language Distillation (AME), aiming to achieve robust generalization under real-world conditions. AME applies entropy minimization over a reconfigured shared manifold, where multi-modal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.