Omnimodal Dataset Distillation via High-order Proxy Alignment
Yuxuan Gao, Xiaohao Liu, Xiaobo Xia, Tongliang Liu

TL;DR
This paper introduces HoPA, a novel method for omnimodal dataset distillation that captures high-order cross-modal alignments, enabling scalable and effective compression across multiple heterogeneous modalities.
Contribution
It proposes a unified high-order proxy alignment approach that overcomes complexity issues in omnimodal dataset distillation, with theoretical and empirical validation.
Findings
HoPA achieves superior compression-performance trade-offs.
Theoretical analysis supports the method's rationality.
Extensive experiments demonstrate effectiveness across benchmarks.
Abstract
Dataset distillation compresses large-scale datasets into compact synthetic sets while preserving training performance, but existing methods are largely restricted to single-modal or bimodal settings. Extending dataset distillation to scenarios involving more than two modalities, i.e., Omnimodal Dataset Distillation, remains underexplored and challenging due to increased heterogeneity and complex cross-modal interactions. In this work, we identify the key determinant that bounds the endpoint discrepancy in the omnimodal setting, which is exacerbated with an increasing number of modalities. To this end, we propose HoPA, a unified method that captures high-order cross-modal alignments via a compact proxy, which is compatible with trajectory matching as well. By abstracting omnimodal alignment with a shared similarity structure, our method avoids the combinatorial complexity of pairwise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
