Dataset Quantization with Active Learning based Adaptive Sampling

Zhenghao Zhao; Yuzhang Shang; Junyi Wu; Yan Yan

arXiv:2407.07268·cs.CV·July 11, 2024

Dataset Quantization with Active Learning based Adaptive Sampling

Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces DQAS, an active learning-based adaptive sampling method for dataset quantization that reduces training costs while maintaining performance, outperforming existing dataset compression techniques.

Contribution

The paper presents a novel adaptive sampling strategy and a dataset quantization pipeline that leverages feature space for improved dataset compression.

Findings

01

Outperforms state-of-the-art dataset compression methods

02

Maintains performance with uneven class sample distributions

03

Reduces training costs significantly

Abstract

Deep learning has made remarkable progress recently, largely due to the availability of large, well-labeled datasets. However, the training on such datasets elevates costs and computational demands. To address this, various techniques like coreset selection, dataset distillation, and dataset quantization have been explored in the literature. Unlike traditional techniques that depend on uniform sample distributions across different classes, our research demonstrates that maintaining performance is feasible even with uneven distributions. We find that for certain classes, the variation in sample quantity has a minimal impact on performance. Inspired by this observation, an intuitive idea is to reduce the number of samples for stable classes and increase the number of samples for sensitive classes to achieve a better performance with the same sampling ratio. Then the question arises: how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ichbill/DQAS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Data Compression Techniques · Gaussian Processes and Bayesian Inference