Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Functions
Gr\'egoire Mialon, Alexandre d'Aspremont, Julien Mairal

TL;DR
This paper introduces a method for efficiently screening data points in empirical risk minimization using ellipsoidal regions and safe loss functions, enabling dataset compression and computational savings.
Contribution
It proposes novel screening tests and loss functions that induce dual sparsity, facilitating data reduction without compromising optimization guarantees.
Findings
Effective data screening tests for classification and regression
Reduction of dataset size while maintaining accuracy
Computational gains in empirical risk minimization
Abstract
We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regression based on ellipsoidal approximations of the optimal set. In addition to producing computational gains, our approach also allows us to compress a dataset into a subset of representative points.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Probabilistic and Robust Engineering Design · Risk and Portfolio Optimization
