TL;DR
UPCORE is a versatile data selection framework that effectively balances data removal and model performance preservation during unlearning, using variance-based pruning to minimize collateral damage in large language models.
Contribution
It introduces a novel variance-based pruning method for coreset selection that improves the trade-off between data deletion and model retention during unlearning.
Findings
UPCORE outperforms baseline methods in balancing deletion and preservation.
The new AUC metric effectively evaluates the trade-off in unlearning tasks.
UPCORE reduces negative transfer and enhances positive transfer in model unlearning.
Abstract
User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model's other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model's representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. Across three…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Coreset pruning via Isolation Forest is simple, plug-and-play, and can sit atop existing unlearning procedures. 2. The paper documents a strong correlation between hidden-state variance of the forget set and post-unlearning utility degradation, which is actionable and intuitive. 3.The paper distinguishes beneficial transfer (pruned points still forgotten) from harmful spillover (unrelated capability loss), and shows reductions in the latter. 4. The AUC trade-off metric across training steps
1. While the paper shows that forget-set variance correlates with collateral damage, it does not establish a causal mechanism linking variance to the learning dynamics of unlearning (e.g., how gradient updates induced by high-variance points propagate damage to neighborhoods). Strengthening this with controlled interventions (e.g., synthetic variance manipulations at fixed difficulty/semantic content) or influence-function analyses would make the claim more compelling. (Author text frames the fi
1. The paper, for the first time, attempts to mitigate the impact of model unlearning on general performance by identifying high-variance “bad points” in the forgetting dataset. This finding and its supporting evidence are novel. 2. The experiments are relatively thorough and, across multiple datasets, demonstrate that the method can preserve unlearning performance while maintaining the original performance of the model. 3. The writing is clear and easy to follow, with no obvious typographical o
1. There is a lack of deeper theoretical guarantees about how much variance reduction one should expect to translate to concrete utility improvement, or about optimality of the Isolation Forest-based selection. For example, the equation (in Section 2.1) frames the coreset selection as an optimization, but in practice this is relaxed to a heuristic on variance. There seems to be little analysis of approximation gap or the behavior under adversarial/practically pathological distributions. 2. The
1. The motivation of this paper is both clever and timely. It addresses the problem of knowledge redundancy within unlearning datasets and proposes a sampling-based approach to alleviate the inefficiency and excessive forgetting caused by such redundancy. 2. The proposed UPCORE method is simple yet effective. It employs a basic statistical approach to identify dense unlearning samples while discarding peripheral ones, making the overall idea intuitive and easy to understand. 3. The extensive e
1. UPCORE is a sampling-based unlearning method that optimizes the model using unlearn samples from dense regions of the feature space. However, relying solely on core unlearn samples may result in suboptimal unlearning performance on non-core samples. Since these non-core samples are farther away in the feature space, they may be less influenced by the UPCORE optimization process. It would be helpful for the authors to clarify or empirically verify whether the method maintains consistent unlear
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
