Beyond Dataset Distillation: Lossless Dataset Concentration via Diffusion-Assisted Distribution Alignment

Tongfei Liu; Yufan Liu; Bing Li; Weiming Hu

arXiv:2603.27987·cs.CV·March 31, 2026

Beyond Dataset Distillation: Lossless Dataset Concentration via Diffusion-Assisted Distribution Alignment

Tongfei Liu, Yufan Liu, Bing Li, Weiming Hu

PDF

TL;DR

This paper introduces a diffusion-assisted framework for lossless dataset concentration that improves efficiency, scalability, and data-free applicability in dataset distillation, achieving state-of-the-art results.

Contribution

It provides a theoretical justification for diffusion models in dataset distillation and proposes a novel DsCo framework with Noise-Optimization and Doping techniques.

Findings

01

Nearly halves dataset size at high data volumes without performance loss.

02

Achieves state-of-the-art results for low data volumes.

03

Applicable in both data-accessible and data-free scenarios.

Abstract

The high cost and accessibility problem associated with large datasets hinder the development of large-scale visual recognition systems. Dataset Distillation addresses these problems by synthesizing compact surrogate datasets for efficient training, storage, transfer, and privacy preservation. The existing state-of-the-art diffusion-based dataset distillation methods face three issues: lack of theoretical justification, poor efficiency in scaling to high data volumes, and failure in data-free scenarios. To address these issues, we establish a theoretical framework that justifies the use of diffusion models by proving the equivalence between dataset distillation and distribution matching, and reveals an inherent efficiency limit in the dataset distillation paradigm. We then propose a Dataset Concentration (DsCo) framework that uses a diffusion-based Noise-Optimization (NOpt) method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.