Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Rudrajit Das; Inderjit S. Dhillon; Alessandro Epasto; Adel Javanmard,; Jieming Mao; Vahab Mirrokni; Sujay Sanghavi; Peilin Zhong

arXiv:2406.11206·cs.LG·May 9, 2025

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Rudrajit Das, Inderjit S. Dhillon, Alessandro Epasto, Adel Javanmard,, Jieming Mao, Vahab Mirrokni, Sujay Sanghavi, Peilin Zhong

PDF

Open Access 1 Video

TL;DR

This paper provides the first theoretical analysis showing that retraining a model with its own predicted hard labels can improve accuracy in noisy label settings, supported by empirical results in privacy-preserving training.

Contribution

It offers a novel theoretical proof that retraining with predicted labels enhances accuracy and demonstrates practical benefits in privacy-preserving training scenarios.

Findings

01

Retraining with predicted labels can improve accuracy in noisy label scenarios.

02

Consensus-based retraining enhances privacy-preserving training without extra privacy cost.

03

Over 6% accuracy improvement achieved in CIFAR-100 with privacy constraints.

Abstract

The performance of a model trained with noisy labels is often improved by simply \textit{retraining} the model with its \textit{own predicted hard labels} (i.e., 1/0 labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable binary classification setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with local label differential privacy (DP) which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Retraining with Predicted Hard Labels Provably Increases Model Accuracy· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Time Series Analysis and Forecasting