Adaptive Second Order Coresets for Data-efficient Machine Learning
Omead Pooladzandi, David Davini, Baharan Mirzasoleiman

TL;DR
AdaCore is a novel data subset selection method that uses the data's geometry and Hessian approximation to improve training efficiency and guarantees convergence, significantly speeding up both convex and non-convex models.
Contribution
We introduce AdaCore, a data-efficient subset selection method with theoretical guarantees, leveraging Hessian-based geometry to enhance training speed and quality.
Findings
AdaCore outperforms baseline methods in subset quality.
Training speed is increased by over 2.9x compared to full data.
Effective for both convex and non-convex models.
Abstract
Training machine learning models on massive datasets incurs substantial computational costs. To alleviate such costs, there has been a sustained effort to develop data-efficient training methods that can carefully select subsets of the training examples that generalize on par with the full training data. However, existing methods are limited in providing theoretical guarantees for the quality of the models trained on the extracted subsets, and may perform poorly in practice. We propose AdaCore, a method that leverages the geometry of the data to extract subsets of the training examples for efficient machine learning. The key idea behind our method is to dynamically approximate the curvature of the loss function via an exponentially-averaged estimate of the Hessian to select weighted subsets (coresets) that provide a close approximation of the full gradient preconditioned with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsLogistic Regression · Coresets
