Data Summarization via Bilevel Optimization
Zal\'an Borsos, Mojm\'ir Mutn\'y, Marco Tagliasacchi, Andreas, Krause

TL;DR
This paper introduces a generic bilevel optimization framework for constructing coresets, enabling efficient data summarization for various models including neural networks, under resource constraints.
Contribution
It presents a model-agnostic coreset construction method based on bilevel optimization, extending applicability beyond simple models to complex neural networks.
Findings
Effective for training non-convex models online
Applicable to batch active learning scenarios
Outperforms existing model-specific coreset methods
Abstract
The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a simple yet powerful approach is to operate on small subsets of the data. Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective. However, existing coreset constructions are highly model-specific and are limited to simple models such as linear regression, logistic regression, and -means. In this work, we propose a generic coreset construction framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem. In contrast to existing approaches, our framework does not require model-specific adaptations and applies to any twice differentiable model, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Algorithms and Data Compression
MethodsCoresets
