GLISTER: Generalization based Data Subset Selection for Efficient and   Robust Learning

Krishnateja Killamsetty; Durga Sivasubramanian; Ganesh Ramakrishnan,; Rishabh Iyer

arXiv:2012.10630·cs.LG·June 15, 2021·20 cites

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan,, Rishabh Iyer

PDF

Open Access 1 Repo 1 Video

TL;DR

GLISTER is a novel framework for data subset selection that enhances the efficiency and robustness of training deep models by optimizing for validation performance, applicable to various loss functions and learning scenarios.

Contribution

Introduces Glister, a bi-level optimization-based data selection method that improves training efficiency and robustness, with an online algorithm and active learning extension.

Findings

01

Reduces training time while maintaining accuracy.

02

Improves robustness under label noise and class imbalance.

03

Enhances batch active learning performance.

Abstract

Large scale machine learning and deep models are extremely data-hungry. Unfortunately, obtaining large amounts of labeled data is expensive, and training state-of-the-art models (with hyperparameter tuning) requires significant computing resources and time. Secondly, real-world data is noisy and imbalanced. As a result, several recent papers try to make the training process more efficient and robust. However, most existing work either focuses on robustness or efficiency, but not both. In this work, we introduce Glister, a GeneraLIzation based data Subset selecTion for Efficient and Robust learning framework. We formulate Glister as a mixed discrete-continuous bi-level optimization problem to select a subset of the training data, which maximizes the log-likelihood on a held-out validation set. Next, we propose an iterative online algorithm Glister-Online, which performs data selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dssresearch/GLISTER
pytorchOfficial

Videos

GLISTER: Generalization Based Data Subset Selection for Efficient and Robust Learning· underline

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Imbalanced Data Classification Techniques