SAS: Semantic-aware Sampling for Generative Dataset Distillation

Mingzhuo Li; Guang Li; Linfeng Ye; Jiafeng Mao; Takahiro Ogawa; Konstantinos N. Plataniotis; Miki Haseyama

arXiv:2605.18012·cs.CV·May 19, 2026

SAS: Semantic-aware Sampling for Generative Dataset Distillation

Mingzhuo Li, Guang Li, Linfeng Ye, Jiafeng Mao, Takahiro Ogawa, Konstantinos N. Plataniotis, Miki Haseyama

PDF

TL;DR

This paper introduces SAS, a semantic-aware sampling method for dataset distillation that leverages CLIP to produce compact, semantically rich datasets, improving downstream model performance.

Contribution

It proposes a novel semantic scoring and sampling strategy using CLIP to enhance dataset distillation with semantic relevance and diversity.

Findings

01

Consistent performance improvements across multiple datasets.

02

Effective filtering of semantically relevant samples.

03

Enhanced semantic class discrimination in distilled datasets.

Abstract

Deep neural networks have achieved impressive performance across a wide range of tasks, but this success often comes with substantial computational and storage costs due to large-scale training data. Dataset distillation addresses this challenge by constructing compact yet informative datasets that enable efficient model training while maintaining downstream performance. However, most existing approaches primarily emphasize matching data distributions or downstream training statistics, with limited attention to preserving high-level semantic information in the distilled data. In this work, we introduce a semantic-aware perspective for dataset distillation by leveraging Contrastive Language-Image Pretraining (CLIP) as a semantic prior for post-sampling. Our goal is to obtain distilled datasets that are not only compact but also semantically class-discriminative and diverse. To this end,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.