A Comparative Analysis of Influence Signals for Data Debugging

Nikolaos Myrtakis; Ioannis Tsamardinos; Vassilis Christophides

arXiv:2506.11584·cs.LG·June 16, 2025

A Comparative Analysis of Influence Signals for Data Debugging

Nikolaos Myrtakis, Ioannis Tsamardinos, Vassilis Christophides

PDF

Open Access

TL;DR

This paper evaluates various influence-based signals for data debugging in machine learning, revealing their strengths and limitations in detecting mislabeled and anomalous samples across different data types and models.

Contribution

It provides a comprehensive experimental comparison of influence signals, highlighting their effectiveness and shortcomings in identifying data glitches during training.

Findings

01

Self-Influence effectively detects mislabeled samples.

02

Existing signals fail to detect anomalies.

03

Training dynamics are crucial for influence signal effectiveness.

Abstract

Improving the quality of training samples is crucial for improving the reliability and performance of ML models. In this paper, we conduct a comparative evaluation of influence-based signals for debugging training data. These signals can potentially identify both mislabeled and anomalous samples from a potentially noisy training set as we build the models and hence alleviate the need for dedicated glitch detectors. Although several influence-based signals (e.g., Self-Influence, Average Absolute Influence, Marginal Influence, GD-class) have been recently proposed in the literature, there are no experimental studies for assessing their power in detecting different glitch types (e.g., mislabeled and anomalous samples) under a common influence estimator (e.g., TraceIn) for different data modalities (image and tabular), and deep learning models (trained from scratch or foundation). Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Big Data and Business Intelligence