Prioritize Alignment in Dataset Distillation

Zekai Li; Ziyao Guo; Wangbo Zhao; Tianle Zhang; Zhi-Qi Cheng; Samir; Khaki; Kaipeng Zhang; Ahmad Sajedi; Konstantinos N Plataniotis; Kai Wang,; Yang You

arXiv:2408.03360·cs.LG·October 15, 2024

Prioritize Alignment in Dataset Distillation

Zekai Li, Ziyao Guo, Wangbo Zhao, Tianle Zhang, Zhi-Qi Cheng, Samir, Khaki, Kaipeng Zhang, Ahmad Sajedi, Konstantinos N Plataniotis, Kai Wang,, Yang You

PDF

Open Access 1 Repo

TL;DR

This paper introduces PAD, a method that improves dataset distillation by aligning information extraction and embedding, leading to state-of-the-art results on benchmarks.

Contribution

PAD proposes a novel alignment strategy that filters and focuses on relevant information, significantly enhancing distillation quality and performance.

Findings

01

PAD achieves state-of-the-art performance on benchmarks.

02

Filtering and focusing on deep layers improves distillation quality.

03

Pruning target dataset reduces misaligned information.

Abstract

Dataset Distillation aims to compress a large dataset into a significantly more compact, synthetic one without compromising the performance of the trained models. To achieve this, existing methods use the agent model to extract information from the target dataset and embed it into the distilled dataset. Consequently, the quality of extracted and embedded information determines the quality of the distilled dataset. In this work, we find that existing methods introduce misaligned information in both information extraction and embedding stages. To alleviate this, we propose Prioritize Alignment in Dataset Distillation (PAD), which aligns information from the following two perspectives. 1) We prune the target dataset according to the compressing ratio to filter the information that can be extracted by the agent model. 2) We use only deep layers of the agent model to perform the distillation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nus-hpc-ai-lab/pad
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration