DIVA: Dataset Derivative of a Learning Task

Yonatan Dukler; Alessandro Achille; Giovanni Paolini; Avinash; Ravichandran; Marzia Polito; Stefano Soatto

arXiv:2111.09785·cs.LG·November 19, 2021

DIVA: Dataset Derivative of a Learning Task

Yonatan Dukler, Alessandro Achille, Giovanni Paolini, Avinash, Ravichandran, Marzia Polito, Stefano Soatto

PDF

Open Access 1 Video

TL;DR

DIVA introduces a method to compute how small changes in training data influence validation error, enabling dataset optimization and auto-curation directly through differentiable analysis of deep neural networks.

Contribution

The paper presents DIVA, a novel differentiable approach to compute dataset derivatives around trained DNNs, facilitating dataset refinement without separate validation sets.

Findings

01

Effective in outlier rejection and dataset extension

02

Enables dataset optimization during training

03

Applicable to multi-modal data aggregation

Abstract

We present a method to compute the derivative of a learning task with respect to a dataset. A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN). The "dataset derivative" is a linear operator, computed around the trained model, that informs how perturbations of the weight of each training sample affect the validation error, usually computed on a separate validation dataset. Our method, DIVA (Differentiable Validation) hinges on a closed-form differentiable expression of the leave-one-out cross-validation error around a pre-trained DNN. Such expression constitutes the dataset derivative. DIVA could be used for dataset auto-curation, for example removing samples with faulty annotations, augmenting a dataset with additional relevant samples, or rebalancing. More generally, DIVA can be used to optimize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DIVA: Dataset Derivative of a Learning Task· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Anomaly Detection Techniques and Applications