Data Cleansing for Models Trained with SGD

Satoshi Hara; Atsushi Nitanda; Takanori Maehara

arXiv:1906.08473·stat.ML·June 21, 2019·24 cites

Data Cleansing for Models Trained with SGD

Satoshi Hara, Atsushi Nitanda, Takanori Maehara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel data cleansing algorithm for models trained with SGD that identifies influential instances without domain knowledge, improving model accuracy by removing such data points.

Contribution

The paper presents a new method to infer influential data points in SGD-trained models without requiring convex loss functions or optimal models, facilitating easier data cleansing.

Findings

01

Accurately infers influential instances in SGD-trained models

02

Improves model accuracy by removing influential data points

03

Effective on datasets like MNIST and CIFAR10

Abstract

Data cleansing is a typical approach used to improve the accuracy of machine learning models, which, however, requires extensive domain knowledge to identify the influential instances that affect the models. In this paper, we propose an algorithm that can suggest influential instances without using any domain knowledge. With the proposed method, users only need to inspect the instances suggested by the algorithm, implying that users do not need extensive knowledge for this procedure, which enables even non-experts to conduct data cleansing and improve the model. The existing methods require the loss function to be convex and an optimal model to be obtained, which is not always the case in modern machine learning. To overcome these limitations, we propose a novel approach specifically designed for the models trained with stochastic gradient descent (SGD). The proposed method infers the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sato9hara/sgd-influence
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)

MethodsStochastic Gradient Descent