Finding High-Value Training Data Subset through Differentiable Convex   Programming

Soumi Das; Arshdeep Singh; Saptarshi Chatterjee; Suparna Bhattacharya,; Sourangshu Bhattacharya

arXiv:2104.13794·cs.LG·April 29, 2021

Finding High-Value Training Data Subset through Differentiable Convex Programming

Soumi Das, Arshdeep Singh, Saptarshi Chatterjee, Suparna Bhattacharya,, Sourangshu Bhattacharya

PDF

1 Repo

TL;DR

This paper introduces a scalable, learnable framework for selecting high-value training data subsets for deep neural networks, leveraging differentiable convex programming to improve subset quality and identify mislabeled data.

Contribution

It proposes a novel end-to-end learnable method for subset selection that accounts for data point interactions, outperforming existing valuation techniques.

Findings

01

Achieves up to 20% higher subset value than state-of-the-art methods.

02

Effectively identifies mislabeled training data.

03

Runs with comparable efficiency to existing valuation functions.

Abstract

Finding valuable training data points for deep neural networks has been a core research challenge with many applications. In recent years, various techniques for calculating the "value" of individual training datapoints have been proposed for explaining trained models. However, the value of a training datapoint also depends on other selected training datapoints - a notion that is not explicitly captured by existing methods. In this paper, we study the problem of selecting high-value subsets of training data. The key idea is to design a learnable framework for online subset selection, which can be learned using mini-batches of training data, thus making our method scalable. This results in a parameterized convex subset selection problem that is amenable to a differentiable convex programming paradigm, thus allowing us to learn the parameters of the selection model in end-to-end training.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SoumiDas/HOST-CP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.