Understanding Private Learning From Feature Perspective

Meng Ding; Mingxi Lei; Shaopeng Fu; Shaowei Wang; Di Wang; Jinhui Xu

arXiv:2511.18006·cs.LG·November 25, 2025

Understanding Private Learning From Feature Perspective

Meng Ding, Mingxi Lei, Shaopeng Fu, Shaowei Wang, Di Wang, Jinhui Xu

PDF

Open Access 3 Reviews

TL;DR

This paper develops a theoretical framework to analyze how features are learned in private training, revealing the importance of signal-to-noise ratio and the impact of noise memorization on generalization in differentially private learning.

Contribution

It introduces the first theoretical analysis of feature dynamics in private learning, distinguishing label-dependent signals from label-independent noise, and highlights the challenges and benefits of feature enhancement.

Findings

01

Private signal learning requires higher SNR than non-private training.

02

Data noise memorization occurs in both private and non-private learning, affecting generalization.

03

Feature enhancement can improve SNR and aid private learning.

Abstract

Differentially private Stochastic Gradient Descent (DP-SGD) has become integral to privacy-preserving machine learning, ensuring robust privacy guarantees in sensitive domains. Despite notable empirical advances leveraging features from non-private, pre-trained models to enhance DP-SGD training, a theoretical understanding of feature dynamics in private learning remains underexplored. This paper presents the first theoretical framework to analyze private training through a feature learning perspective. Building on the multi-patch data structure from prior work, our analysis distinguishes between label-dependent feature signals and label-independent noise, a critical aspect overlooked by existing analyses in the DP community. Employing a two-layer CNN with polynomial ReLU activation, we theoretically characterize both feature signal learning and data noise memorization in private…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

* **Interesting novel perspective on private machine learning**. You discuss an interesting shift in perspective from utility measures such as risk/accuracy to feature-learning-based analyses. The analysis of the learning dynamics in terms of SNR is relevant as it more closely captures what the structure of what is learnt is, rather than simply evaluating output accuracy. Your analysis gives insights on _why_ DP training may degrade feature quality and how this can be counteracted. The change of

Weaknesses

* **Formality of the DP guarantee**. I have some concerns with respect to the privacy guarantee you provide. In your analysis, you assume no gradient clipping which, paired with ReLU activation, leads to unbounded $L_2$ sensitivity. Lemma 4.4 provides a high probability bound on the internal coefficients and, from this, you mention that the gradient sensitivity can therefore be bounded by a term proportional to the noise term (line 285). However, it remains unclear to me whether the bound on the

Reviewer 02Rating 4Confidence 3

Strengths

- The paper investigates the poor quality of differentially private learning in terms of feature learning, which has not been actively investigated. - The authors provide a theoretical analysis of why DP training is largely affected by noise addition relative to the signal. - Based on their observations, the authors argue that a stronger feature signal is needed for private learning and that data noise may cause data noise memorization.

Weaknesses

Please refer to the Questions section.

Reviewer 03Rating 4Confidence 2

Strengths

There is limited exploration of private learning beyond convex settings, and this work investigates one non-convex setting, showing a privacy cost: training privately requires a higher SNR to generalize than non-private learning. The paper provides explicit bounds on training and test error for this setting, which might be an interesting result.

Weaknesses

* The setting and analysis are very similar to prior work (Cao et al., 2022). * The sensitivity of the gradient step appears to depend on the data noise $\xi$, which is data-dependent, so the privacy guarantee is unclear.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security