Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation

Xin Zhang; Ziruo Zhang; Jiawei Du; Zuozhu Liu; and Joey Tianyi Zhou

arXiv:2505.14705·cs.CV·May 22, 2025

Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation

Xin Zhang, Ziruo Zhang, Jiawei Du, Zuozhu Liu, and Joey Tianyi Zhou

PDF

Open Access 1 Models

TL;DR

This paper introduces RepBlend, a novel framework for multimodal dataset distillation that alleviates modality collapse by blending representations and balancing supervision, leading to improved cross-modal learning and efficiency.

Contribution

RepBlend is the first method to address modality collapse in MDD by representation blending and symmetric projection matching, enhancing intra-modal diversity and cross-modal alignment.

Findings

01

Outperforms prior MDD methods on Flickr-30K and MS-COCO

02

Achieves up to 9.4 IR@10 and 6.3 TR@10 improvements

03

Provides up to 6.7× speedup in distillation

Abstract

Multimodal Dataset Distillation (MDD) seeks to condense large-scale image-text datasets into compact surrogates while retaining their effectiveness for cross-modal learning. Despite recent progress, existing MDD approaches often suffer from \textit{\textbf{Modality Collapse}}, characterized by over-concentrated intra-modal representations and enlarged distributional gap across modalities. In this paper, at the first time, we identify this issue as stemming from a fundamental conflict between the over-compression behavior inherent in dataset distillation and the cross-modal supervision imposed by contrastive objectives. To alleviate modality collapse, we introduce \textbf{RepBlend}, a novel MDD framework that weakens overdominant cross-modal supervision via representation blending, thereby significantly enhancing intra-modal diversity. Additionally, we observe that current MDD methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
xinxin66/RepBlend
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies