GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
Xuhui Li, Zhengquan Luo, Zihui Cui, Zhiqiang Xu

TL;DR
GeoDM introduces a geometry-aware distribution matching framework for dataset distillation that operates in mixed geometric spaces, capturing intrinsic data structures to improve fidelity and performance.
Contribution
It proposes a novel unified framework in product manifolds with learnable curvature, enhancing distribution matching for dataset distillation.
Findings
Outperforms state-of-the-art methods on benchmarks
Effective across various geometric distribution-matching strategies
Theoretically achieves smaller generalization error bounds
Abstract
Dataset distillation aims to synthesize a compact subset of the original data, enabling models trained on it to achieve performance comparable to those trained on the original large dataset. Existing distribution-matching methods are confined to Euclidean spaces, making them only capture linear structures and overlook the intrinsic geometry of real data, e.g., curvature. However, high-dimensional data often lie on low-dimensional manifolds, suggesting that dataset distillation should have the distilled data manifold aligned with the original data manifold. In this work, we propose a geometry-aware distribution-matching framework, called \textbf{GeoDM}, which operates in the Cartesian product of Euclidean, hyperbolic, and spherical manifolds, with flat, hierarchical, and cyclical structures all captured by a unified representation. To adapt to the underlying data geometry, we introduce…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The connection between manifold hypothesis and dataset distillation is intuitive and clearly articulated. Figure 1 effectively demonstrates that data exhibits non-Euclidean geometric structure that Euclidean spaces fail to capture. 2. The experiments cover multiple datasets, baselines, and ablation studies. The robustness across different distribution matching methods (DM, DSDM) and cross-architecture evaluation demonstrate generalizability. 3. Theorems 4.1 and 4.2 provide mathematical just
1. The use of product manifolds, Riemannian CNNs, hyperbolic/spherical embeddings, and optimal transport are all well-established techniques. The main contribution is combining them for dataset distillation, which feels somewhat incremental. The paper would benefit from deeper insights into why this particular combination works. 2. While GeoDM consistently outperforms baselines, the gains are often 1-3%, which may not justify the substantial increase in complexity (three geometry branches, lear
- the motivation of the paper is very clear and intuitive, Euclidean latent spaces likely miss curvature. - the main idea is conceptually very intuitive to follow, a combination of several similar modules, and learnt weights.
- The paper compares single geometry vs three, but omits two-geometry combinations (E+H, E+S, H+S) in ablation studies. Without this, the claim that all three curvatures matter remains unverified. - Some of the assumptions might be too unrealistic, for example, uniform algorithmic stability is unlikely satisfied by deep non-convex training. Empirical check to support the relevance of the theoretical terms will be necessary. - The method introduces many complicated components, and whether the
- It explains why doing dataset distillation only in Euclidean space can miss real data geometry. - It proposes distribution matching in a product space (Euclidean + hyperbolic + spherical) so each type of structure can be represented. - The curvatures and the weights of the three geometries are learnable, letting the method adapt to each dataset. - A geometry-aware OT loss aligns real and synthetic data across the three components and avoids one component dominating. - Theory sounds, which trie
- Theory rests on specific assumptions. The analysis relies on “mild regularity” assumptions and constant-curvature product spaces (Euclidean, hyperbolic, spherical); real data may not fit these perfectly. - The model fixes the dimensionality of each manifold factor. - The method introduces learnable curvature, geometry weights, and an OT term with its own coefficient/regularization, hence more components and hyperparameters to manage. - Added complexity. The approach uses a product of three geo
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Time Series Analysis and Forecasting · Machine Learning in Healthcare
