DIVER:Diving Deeper into Distilled Data via Expressive Semantic Recovery

Qianxin Xia; Zhiyong Shu; Wenbo Jiang; Jiawei Du; Jielei Wang; Guoming Lu

arXiv:2605.12649·cs.CV·May 14, 2026

DIVER:Diving Deeper into Distilled Data via Expressive Semantic Recovery

Qianxin Xia, Zhiyong Shu, Wenbo Jiang, Jiawei Du, Jielei Wang, Guoming Lu

PDF

1 Repo

TL;DR

DIVER introduces a dual-stage dataset distillation framework utilizing a pre-trained diffusion model to enhance semantic preservation and cross-architecture generalization, achieving efficient performance with minimal computational resources.

Contribution

The paper proposes a novel dual-stage distillation method called DIVER that leverages a diffusion model for semantic inheritance, guidance, and fusion, improving over single-stage approaches.

Findings

01

DIVER significantly improves cross-architecture generalization.

02

The method achieves comparable processing time to raw DiT on ImageNet.

03

DIVER requires only 4 GB of GPU memory for large-scale distillation.

Abstract

Dataset distillation aims to synthesize a compact proxy dataset that is unreadable or non-raw from the original dataset for privacy protection and highly efficient learning. However, previous approaches typically adopt a single-stage distillation paradigm, which suffers from learning specific patterns that overfit on a prior architecture, consequently suppressing the expression of semantics and leading to performance degradation across heterogeneous architectures. To address this issue, we propose a novel dual-stage distillation framework called $DIVER$ , which leverages the pre-trained diffusion model to dive deeper into $DI$ stilled data $V$ ia $E$ xpressive semantic $R$ ecovery, an entire process of semantic inheritance, guidance, and fusion. Semantic inheritance distills high-level semantics of abstract distilled images into the latent space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

einsteinxia/DIVER
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.