DiffProb: Data Pruning for Face Recognition

Eduarda Caldeira; Jan Niklas Kolf; Naser Damer; Fadi Boutros

arXiv:2505.15272·cs.CV·May 22, 2025

DiffProb: Data Pruning for Face Recognition

Eduarda Caldeira, Jan Niklas Kolf, Naser Damer, Fadi Boutros

PDF

Open Access 1 Repo

TL;DR

DiffProb is a novel data pruning method for face recognition that reduces dataset size and training costs by removing redundant and mislabeled samples, while maintaining or improving accuracy across various benchmarks.

Contribution

This paper introduces DiffProb, the first data pruning approach tailored for face recognition, enhancing data quality and training efficiency with minimal accuracy loss.

Findings

01

Prunes up to 50% of training data without accuracy loss

02

Improves training efficiency and reduces data storage needs

03

Maintains robustness across different architectures and loss functions

Abstract

Face recognition models have made substantial progress due to advances in deep learning and the availability of large-scale datasets. However, reliance on massive annotated datasets introduces challenges related to training computational cost and data storage, as well as potential privacy concerns regarding managing large face datasets. This paper presents DiffProb, the first data pruning approach for the application of face recognition. DiffProb assesses the prediction probabilities of training samples within each identity and prunes the ones with identical or close prediction probability values, as they are likely reinforcing the same decision boundaries, and thus contribute minimally with new information. We further enhance this process with an auxiliary cleaning mechanism to eliminate mislabeled and label-flipped samples, boosting data quality with minimal loss. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EduardaCaldeira/DiffProb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsPruning