Omnimodal Dataset Distillation via High-order Proxy Alignment

Yuxuan Gao; Xiaohao Liu; Xiaobo Xia; Tongliang Liu

arXiv:2604.10666·cs.CV·April 14, 2026

Omnimodal Dataset Distillation via High-order Proxy Alignment

Yuxuan Gao, Xiaohao Liu, Xiaobo Xia, Tongliang Liu

PDF

TL;DR

This paper introduces HoPA, a novel method for omnimodal dataset distillation that captures high-order cross-modal alignments, enabling scalable and effective compression across multiple heterogeneous modalities.

Contribution

It proposes a unified high-order proxy alignment approach that overcomes complexity issues in omnimodal dataset distillation, with theoretical and empirical validation.

Findings

01

HoPA achieves superior compression-performance trade-offs.

02

Theoretical analysis supports the method's rationality.

03

Extensive experiments demonstrate effectiveness across benchmarks.

Abstract

Dataset distillation compresses large-scale datasets into compact synthetic sets while preserving training performance, but existing methods are largely restricted to single-modal or bimodal settings. Extending dataset distillation to scenarios involving more than two modalities, i.e., Omnimodal Dataset Distillation, remains underexplored and challenging due to increased heterogeneity and complex cross-modal interactions. In this work, we identify the key determinant that bounds the endpoint discrepancy in the omnimodal setting, which is exacerbated with an increasing number of modalities. To this end, we propose HoPA, a unified method that captures high-order cross-modal alignments via a compact proxy, which is compatible with trajectory matching as well. By abstracting omnimodal alignment with a shared similarity structure, our method avoids the combinatorial complexity of pairwise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.