Probabilistic Bilevel Coreset Selection
Xiao Zhou, Renjie Pi, Weizhong Zhang, Yong Lin, Tong Zhang

TL;DR
This paper introduces a novel probabilistic bilevel approach for coreset selection that learns sample weights to efficiently select representative data subsets, outperforming existing methods especially in noisy and imbalanced data scenarios.
Contribution
It proposes the first continuous probabilistic bilevel formulation for coreset selection, with an efficient solver using unbiased policy gradient, improving performance over traditional greedy methods.
Findings
Outperforms existing coreset selection methods in various tasks.
Effective in challenging label-noise and class-imbalance scenarios.
Provides convergence guarantees for the proposed training procedure.
Abstract
The goal of coreset selection in supervised learning is to produce a weighted subset of data, so that training only on the subset achieves similar performance as training on the entire dataset. Existing methods achieved promising results in resource-constrained scenarios such as continual learning and streaming. However, most of the existing algorithms are limited to traditional machine learning models. A few algorithms that can handle large models adopt greedy search approaches due to the difficulty in solving the discrete subset selection problem, which is computationally costly when coreset becomes larger and often produces suboptimal results. In this work, for the first time we propose a continuous probabilistic bilevel formulation of coreset selection by learning a probablistic weight for each training sample. The overall objective is posed as a bilevel optimization problem, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Advanced Neural Network Applications
MethodsCoresets
