Loss-Proportional Subsampling for Subsequent ERM

Paul Mineiro; Nikos Karampatziakis

arXiv:1306.1840·cs.LG·June 25, 2013·5 cites

Loss-Proportional Subsampling for Subsequent ERM

Paul Mineiro, Nikos Karampatziakis

PDF

Open Access

TL;DR

This paper introduces a loss-proportional subsampling method that efficiently reduces data size before empirical risk minimization, maintaining strong performance guarantees and demonstrating practical benefits on large datasets.

Contribution

The paper presents a novel sampling scheme that considers a subset of hypotheses to reduce data size while ensuring competitive excess risk bounds.

Findings

01

Effective data reduction prior to ERM

02

Guarantees on excess risk compared to full data

03

Improved efficiency on large datasets

Abstract

We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning