EMP: Enhance Memory in Data Pruning
Jinying Xiao, Ping Li, Jie Nie, Bin Ji, Shasha Li, Xiaodong Liu, Jun Ma, Qingbo Wu, Jie Yu

TL;DR
This paper introduces EMP, a novel data pruning method that enhances model memory to maintain high performance even with aggressive dataset reduction, applicable across vision and language tasks.
Contribution
The paper proposes a new memory-augmented scoring function for data pruning, with theoretical analysis and practical approximation, improving model retention under high pruning rates.
Findings
EMP outperforms existing methods by 2.2% on CIFAR100-ResNet50 with 70% pruning.
Theoretical analysis explains the inefficiency of low-frequency learning in data pruning.
Memory enhancement improves performance in image classification and NLP tasks.
Abstract
Recently, large language and vision models have shown strong performance, but due to high pre-training and fine-tuning costs, research has shifted towards faster training via dataset pruning. Previous methods used sample loss as an evaluation criterion, aiming to select the most "difficult" samples for training. However, when the pruning rate increases, the number of times each sample is trained becomes more evenly distributed, which causes many critical or general samples to not be effectively fitted. We refer to this as Low-Frequency Learning (LFL). In other words, LFL prevents the model from remembering most samples. In our work, we decompose the scoring function of LFL, provide a theoretical explanation for the inefficiency of LFL, and propose adding a memory term to the scoring function to enhance the model's memory capability, along with an approximation of this memory term.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications
MethodsPruning
