Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning
Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan Yao, Tong Zhang

TL;DR
This paper introduces COPS, a theoretically optimal sampling method based on uncertainty estimation, designed to improve data selection efficiency in deep learning, reducing costs while maintaining high model performance.
Contribution
The study presents COPS, a novel uncertainty-based sampling technique for coreset selection and active learning, applicable to deep neural networks, with theoretical optimality and practical effectiveness.
Findings
COPS outperforms baseline sampling methods in deep learning tasks.
Empirical results show improved model accuracy with fewer training samples.
The method effectively estimates sampling ratios using model logits, enhancing deep learning efficiency.
Abstract
Modern deep learning heavily relies on large labeled datasets, which often comse with high costs in terms of both manual labeling and computational resources. To mitigate these challenges, researchers have explored the use of informative subset selection techniques, including coreset selection and active learning. Specifically, coreset selection involves sampling data with both input () and output (), active learning focuses solely on the input data (). In this study, we present a theoretically optimal solution for addressing both coreset selection and active learning within the context of linear softmax regression. Our proposed method, COPS (unCertainty based OPtimal Sub-sampling), is designed to minimize the expected loss of a model trained on subsampled data. Unlike existing approaches that rely on explicit calculations of the inverse covariance matrix, which are not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning in Materials Science · Fault Detection and Control Systems
MethodsSoftmax
