Leveraging Unlabeled Data to Track Memorization

Mahsa Forouzesh; Hanie Sedghi; Patrick Thiran

arXiv:2212.04461·cs.LG·December 9, 2022

Leveraging Unlabeled Data to Track Memorization

Mahsa Forouzesh, Hanie Sedghi, Patrick Thiran

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new metric called susceptibility that measures neural network memorization of noisy labels using only unlabeled data, aiding in understanding and improving model robustness.

Contribution

It proposes a simple, label-agnostic susceptibility metric for tracking memorization during training, supported by empirical and theoretical analysis.

Findings

01

Susceptibility effectively tracks memorization across architectures and datasets.

02

Models with low susceptibility generalize better to clean data.

03

Susceptibility combined with training accuracy distinguishes well-generalizing models.

Abstract

Deep neural networks may easily memorize noisy labels present in real-world data, which degrades their ability to generalize. It is therefore important to track and evaluate the robustness of models against noisy label memorization. We propose a metric, called susceptibility, to gauge such memorization for neural networks. Susceptibility is simple and easy to compute during training. Moreover, it does not require access to ground-truth labels and it only uses unlabeled data. We empirically show the effectiveness of our metric in tracking memorization on various architectures and datasets and provide theoretical insights into the design of the susceptibility metric. Finally, we show through extensive experiments on datasets with synthetic and real-world label noise that one can utilize susceptibility and the overall training accuracy to distinguish models that maintain a low memorization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahf93/tracking-memorization
pytorchOfficial

Videos

Leveraging Unlabeled Data to Track Memorization· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning