Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
Xingli Fang, Jung-Eun Kim

TL;DR
This paper reveals that only a small subset of neural network weights are both critical for utility and vulnerable to privacy attacks, proposing a targeted rewinding method to enhance privacy without sacrificing performance.
Contribution
It introduces a novel approach focusing on critical weights' locations for privacy preservation, outperforming existing methods in resisting membership inference attacks.
Findings
Selective weight rewinding improves privacy protection.
Method maintains high utility compared to full retraining.
Critical weights are more about location than value.
Abstract
Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.
Peer Reviews
Decision·ICLR 2026 Poster
1. The work offers a significant contribution by clearly explaining why standard model pruning fails to mitigate privacy risks, linking it directly to this entanglement. 2. The paper proposes CWRF that cleverly combines machine unlearning for vulnerability estimation with weight rewinding, which boosts existing privacy-preserving methods to achieve.
1. The paper provides no sensitivity analysis for the hyperparameter rewinding rate $r$ and $\lambda$, making it unclear how to set it efficiently. 2. The empirical validation is limited to small-scale models (ResNet18, small ViT) and datasets (CIFAR, CINIC). It is not demonstrated whether the vulnerability estimation step is computationally feasible or if the core insight scales to LLMs. 3. It is recommended to include full privacy–utility curves (similar to those reported for RelaxLoss and CC
S1. The paper is well written and easy to follow S2. The insights are interesting and helpful
I found the claims in the paper are interesting but unconvincing due to several points: W1. Lacks of theoretical motivation. The paper's central hypothesis that a weight's importance stems from its location, not its value is a strong one, but it is presented without theoretical proof and is supported only by a specific set of ablation studies. The paper argues that because A3 (CWRF) successfully recovers accuracy while A1 (Remove) fails, the hypothesis is validated. While the result is compelli
**[S1]** Provides valuable insights on which weights correspond to MIA vulnerability and how they are also largely the same as the ones that are most important to generalizability. The methods used to identify such weights are sound and convincing. **[S2]** Insights provided in Fig. 5 to motivate the design of CWRF are interesting; it clearly demonstrates why it is key to both rewind privacy vulnerable weights and fine-tune privacy-invulnerable (so to speak) weights. It is also interesting how
**[W1]** The description of the method could be done more clearly. While the motivation of the methodology by discussing prior approaches to rewinding, weights freezing and fine-tuning, and ablation studies is a great choice, it will be highly beneficial to have a concise self-contained discussion about the steps involved in CWRF along with pseudocode to avoid ambiguity and for easy reference. **[W2]** The text in tables 1 and 2 is very small and not very appropriate for a potential camera-read
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Ethics and Social Impacts of AI
