Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning
Zhengyang Liang, Meiyu Liang, Wei Huang, Yawen Li, Zhe Xue

TL;DR
This paper introduces a novel dynamic self-adaptive multiscale distillation method for efficient cross-modal representation learning, reducing computational costs while maintaining high performance in multimodal tasks.
Contribution
It proposes a multiscale distillation strategy with a dynamic loss balancer, enabling effective knowledge transfer from large models using minimal resources.
Findings
Achieves state-of-the-art performance on cross-modal retrieval tasks.
Reduces model complexity and training costs significantly.
Operates effectively with only image-level information.
Abstract
In recent years, pre-trained multimodal large models have attracted widespread attention due to their outstanding performance in various multimodal applications. Nonetheless, the extensive computational resources and vast datasets required for their training present significant hurdles for deployment in environments with limited computational resources. To address this challenge, we propose a novel dynamic self-adaptive multiscale distillation from pre-trained multimodal large model for efficient cross-modal representation learning for the first time. Unlike existing distillation methods, our strategy employs a multiscale perspective, enabling the extraction structural knowledge across from the pre-trained multimodal large model. Ensuring that the student model inherits a comprehensive and nuanced understanding of the teacher knowledge. To optimize each distillation loss in a balanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Text and Document Classification Technologies · Advanced Image and Video Retrieval Techniques
