Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Xingli Fang; Jung-Eun Kim

arXiv:2603.13186·cs.LG·March 16, 2026

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Xingli Fang, Jung-Eun Kim

PDF

Open Access 3 Reviews

TL;DR

This paper reveals that only a small subset of neural network weights are both critical for utility and vulnerable to privacy attacks, proposing a targeted rewinding method to enhance privacy without sacrificing performance.

Contribution

It introduces a novel approach focusing on critical weights' locations for privacy preservation, outperforming existing methods in resisting membership inference attacks.

Findings

01

Selective weight rewinding improves privacy protection.

02

Method maintains high utility compared to full retraining.

03

Critical weights are more about location than value.

Abstract

Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

1. The work offers a significant contribution by clearly explaining why standard model pruning fails to mitigate privacy risks, linking it directly to this entanglement. 2. The paper proposes CWRF that cleverly combines machine unlearning for vulnerability estimation with weight rewinding, which boosts existing privacy-preserving methods to achieve.

Weaknesses

1. The paper provides no sensitivity analysis for the hyperparameter rewinding rate $r$ and $\lambda$, making it unclear how to set it efficiently. 2. The empirical validation is limited to small-scale models (ResNet18, small ViT) and datasets (CIFAR, CINIC). It is not demonstrated whether the vulnerability estimation step is computationally feasible or if the core insight scales to LLMs. 3. It is recommended to include full privacy–utility curves (similar to those reported for RelaxLoss and CC

Reviewer 02Rating 4Confidence 5

Strengths

S1. The paper is well written and easy to follow S2. The insights are interesting and helpful

Weaknesses

I found the claims in the paper are interesting but unconvincing due to several points: W1. Lacks of theoretical motivation. The paper's central hypothesis that a weight's importance stems from its location, not its value is a strong one, but it is presented without theoretical proof and is supported only by a specific set of ablation studies. The paper argues that because A3 (CWRF) successfully recovers accuracy while A1 (Remove) fails, the hypothesis is validated. While the result is compelli

Reviewer 03Rating 8Confidence 4

Strengths

**[S1]** Provides valuable insights on which weights correspond to MIA vulnerability and how they are also largely the same as the ones that are most important to generalizability. The methods used to identify such weights are sound and convincing. **[S2]** Insights provided in Fig. 5 to motivate the design of CWRF are interesting; it clearly demonstrates why it is key to both rewind privacy vulnerable weights and fine-tune privacy-invulnerable (so to speak) weights. It is also interesting how

Weaknesses

**[W1]** The description of the method could be done more clearly. While the motivation of the methodology by discussing prior approaches to rewinding, weights freezing and fine-tuning, and ablation studies is a great choice, it will be highly beneficial to have a concise self-contained discussion about the steps involved in CWRF along with pseudocode to avoid ambiguity and for easy reference. **[W2]** The text in tables 1 and 2 is very small and not very appropriate for a potential camera-read

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Ethics and Social Impacts of AI