Dynamic Self-adaptive Multiscale Distillation from Pre-trained   Multimodal Large Model for Efficient Cross-modal Representation Learning

Zhengyang Liang; Meiyu Liang; Wei Huang; Yawen Li; Zhe Xue

arXiv:2404.10838·cs.CV·April 18, 2024·2 cites

Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning

Zhengyang Liang, Meiyu Liang, Wei Huang, Yawen Li, Zhe Xue

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel dynamic self-adaptive multiscale distillation method for efficient cross-modal representation learning, reducing computational costs while maintaining high performance in multimodal tasks.

Contribution

It proposes a multiscale distillation strategy with a dynamic loss balancer, enabling effective knowledge transfer from large models using minimal resources.

Findings

01

Achieves state-of-the-art performance on cross-modal retrieval tasks.

02

Reduces model complexity and training costs significantly.

03

Operates effectively with only image-level information.

Abstract

In recent years, pre-trained multimodal large models have attracted widespread attention due to their outstanding performance in various multimodal applications. Nonetheless, the extensive computational resources and vast datasets required for their training present significant hurdles for deployment in environments with limited computational resources. To address this challenge, we propose a novel dynamic self-adaptive multiscale distillation from pre-trained multimodal large model for efficient cross-modal representation learning for the first time. Unlike existing distillation methods, our strategy employs a multiscale perspective, enabling the extraction structural knowledge across from the pre-trained multimodal large model. Ensuring that the student model inherits a comprehensive and nuanced understanding of the teacher knowledge. To optimize each distillation loss in a balanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrisx599/dsmd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Text and Document Classification Technologies · Advanced Image and Video Retrieval Techniques