ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation

Yue Min; Shaobo Wang; Jiaze Li; Tianle Niu; Junxin Fan; Yongliang Miao; Lijin Yang; Linfeng Zhang

arXiv:2511.08263·cs.CV·November 12, 2025

ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation

Yue Min, Shaobo Wang, Jiaze Li, Tianle Niu, Junxin Fan, Yongliang Miao, Lijin Yang, Linfeng Zhang

PDF

Open Access

TL;DR

ImageBindDC introduces a novel data condensation method for multimodal data that preserves inter-modal dependencies using a characteristic function loss, enabling efficient training with significantly fewer data points.

Contribution

The paper proposes a new multimodal data condensation framework using a characteristic function loss in the Fourier domain to better preserve inter-modal relationships.

Findings

01

Achieves lossless performance with only 5 condensed points per class on NYU-v2.

02

Outperforms previous methods with an 8.2% accuracy improvement.

03

Reduces condensation time by more than 4 times.

Abstract

Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimodal scenarios where preserving intricate inter-modal dependencies is crucial. To address this, we introduce ImageBindDC, a novel data condensation framework operating within the unified feature space of ImageBind. Our approach moves beyond conventional distribution-matching by employing a powerful Characteristic Function (CF) loss, which operates in the Fourier domain to facilitate a more precise statistical alignment via exact infinite moment matching. We design our objective to enforce three critical levels of distributional consistency: (i) uni-modal alignment, which matches the statistical properties of synthetic and real data within each modality; (ii) cross-modal alignment, which preserves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis