PUMA: margin-based data pruning

Javier Maroto; Pascal Frossard

arXiv:2405.06298·cs.LG·May 13, 2024

PUMA: margin-based data pruning

Javier Maroto, Pascal Frossard

PDF

Open Access

TL;DR

PUMA is a novel data pruning method that uses margin-based criteria with DeepFool to improve adversarial robustness and accuracy, reducing data needs and enhancing the robustness-accuracy trade-off in deep learning models.

Contribution

The paper introduces PUMA, a new margin-based data pruning strategy that effectively improves robustness and accuracy by jointly adjusting training attack norms, outperforming existing pruning methods.

Findings

01

PUMA achieves similar robustness with less data.

02

It significantly improves model accuracy over existing methods.

03

PUMA enhances the robustness-accuracy trade-off in adversarial training.

Abstract

Deep learning has been able to outperform humans in terms of classification accuracy in many tasks. However, to achieve robustness to adversarial perturbations, the best methodologies require to perform adversarial training on a much larger training set that has been typically augmented using generative models (e.g., diffusion models). Our main objective in this work, is to reduce these data requirements while achieving the same or better accuracy-robustness trade-offs. We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin). We find that the existing approaches that prune samples with low margin fails to increase robustness when we add a lot of synthetic data, and explain this situation with a perceptron learning task. Moreover, we find that pruning high margin samples for better accuracy increases the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Advanced Clustering Algorithms Research · Data Management and Algorithms

MethodsSparse Evolutionary Training · Pruning · Diffusion · Focus