Dataset Quantization with Active Learning based Adaptive Sampling
Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan

TL;DR
This paper introduces DQAS, an active learning-based adaptive sampling method for dataset quantization that reduces training costs while maintaining performance, outperforming existing dataset compression techniques.
Contribution
The paper presents a novel adaptive sampling strategy and a dataset quantization pipeline that leverages feature space for improved dataset compression.
Findings
Outperforms state-of-the-art dataset compression methods
Maintains performance with uneven class sample distributions
Reduces training costs significantly
Abstract
Deep learning has made remarkable progress recently, largely due to the availability of large, well-labeled datasets. However, the training on such datasets elevates costs and computational demands. To address this, various techniques like coreset selection, dataset distillation, and dataset quantization have been explored in the literature. Unlike traditional techniques that depend on uniform sample distributions across different classes, our research demonstrates that maintaining performance is feasible even with uneven distributions. We find that for certain classes, the variation in sample quantity has a minimal impact on performance. Inspired by this observation, an intuitive idea is to reduce the number of samples for stable classes and increase the number of samples for sensitive classes to achieve a better performance with the same sampling ratio. Then the question arises: how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Data Compression Techniques · Gaussian Processes and Bayesian Inference
