Screening Data Points in Empirical Risk Minimization via Ellipsoidal   Regions and Safe Loss Functions

Gr\'egoire Mialon; Alexandre d'Aspremont; Julien Mairal

arXiv:1912.02566·cs.LG·June 15, 2020·1 cites

Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Functions

Gr\'egoire Mialon, Alexandre d'Aspremont, Julien Mairal

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for efficiently screening data points in empirical risk minimization using ellipsoidal regions and safe loss functions, enabling dataset compression and computational savings.

Contribution

It proposes novel screening tests and loss functions that induce dual sparsity, facilitating data reduction without compromising optimization guarantees.

Findings

01

Effective data screening tests for classification and regression

02

Reduction of dataset size while maintaining accuracy

03

Computational gains in empirical risk minimization

Abstract

We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regression based on ellipsoidal approximations of the optimal set. In addition to producing computational gains, our approach also allows us to compress a dataset into a subset of representative points.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GregoireMialon/screening_samples
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Probabilistic and Robust Engineering Design · Risk and Portfolio Optimization