DiffProb: Data Pruning for Face Recognition
Eduarda Caldeira, Jan Niklas Kolf, Naser Damer, Fadi Boutros

TL;DR
DiffProb is a novel data pruning method for face recognition that reduces dataset size and training costs by removing redundant and mislabeled samples, while maintaining or improving accuracy across various benchmarks.
Contribution
This paper introduces DiffProb, the first data pruning approach tailored for face recognition, enhancing data quality and training efficiency with minimal accuracy loss.
Findings
Prunes up to 50% of training data without accuracy loss
Improves training efficiency and reduces data storage needs
Maintains robustness across different architectures and loss functions
Abstract
Face recognition models have made substantial progress due to advances in deep learning and the availability of large-scale datasets. However, reliance on massive annotated datasets introduces challenges related to training computational cost and data storage, as well as potential privacy concerns regarding managing large face datasets. This paper presents DiffProb, the first data pruning approach for the application of face recognition. DiffProb assesses the prediction probabilities of training samples within each identity and prunes the ones with identical or close prediction probability values, as they are likely reinforcing the same decision boundaries, and thus contribute minimally with new information. We further enhance this process with an auxiliary cleaning mechanism to eliminate mislabeled and label-flipped samples, boosting data quality with minimal loss. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsPruning
