Distilled Datamodel with Reverse Gradient Matching

Jingwen Ye; Ruonan Yu; Songhua Liu; Xinchao Wang

arXiv:2404.14006·cs.LG·April 23, 2024

Distilled Datamodel with Reverse Gradient Matching

Jingwen Ye, Ruonan Yu, Songhua Liu, Xinchao Wang

PDF

Open Access

TL;DR

This paper presents a novel, efficient framework for assessing the impact of training data on large models by using a distilled synset and reverse gradient matching, significantly reducing computational costs.

Contribution

The authors introduce a new method combining offline data influence approximation with online evaluation to speed up leave-one-out data impact analysis.

Findings

01

Achieves comparable data impact assessment accuracy to retraining methods.

02

Significantly reduces computational time for data attribution tasks.

03

Effective in evaluating data quality and influence in large-scale models.

Abstract

The proliferation of large-scale AI models trained on extensive datasets has revolutionized machine learning. With these models taking on increasingly central roles in various applications, the need to understand their behavior and enhance interpretability has become paramount. To investigate the impact of changes in training data on a pre-trained model, a common approach is leave-one-out retraining. This entails systematically altering the training dataset by removing specific samples to observe resulting changes within the model. However, retraining the model for each altered dataset presents a significant computational challenge, given the need to perform this operation for every dataset variation. In this paper, we introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. During the offline training phase, we approximate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Data Management and Algorithms · 3D Shape Modeling and Analysis