Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

Radu Lecoiu; Debarghya Mukherjee; Pragya Sur

arXiv:2605.17778·math.ST·May 19, 2026

Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

Radu Lecoiu, Debarghya Mukherjee, Pragya Sur

PDF

TL;DR

This paper establishes that s-step self-distillation is statistically optimal among spectral shrinkage estimators for spiked covariance models, explaining its effectiveness in improving model performance.

Contribution

It provides the first rigorous statistical analysis of self-distillation, showing its optimality among spectral shrinkage estimators and connecting it with classical shrinkage methods.

Findings

01

s-step self-distillation achieves optimal performance among spectral shrinkage estimators

02

Any fewer than s steps results in suboptimal estimators

03

Optimal Ridge regression outperforms other spectral shrinkage estimators in isotropic cases

Abstract

Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing a broad class of estimators, namely spectral shrinkage estimators. We establish that for spiked covariance matrices with $s$ spikes, $s$ -step self-distillation achieves optimal performance among spectral shrinkage estimators, outperforming well-known estimators in statistics and machine learning. Moreover, we show that $s$ steps are necessary for optimality: any $(s - k)$ -step distilled estimator is strictly suboptimal for $1 \leq k \leq s$ . For the special subclass of isotropic covariances, we show that optimally tuned Ridge regression performs best among spectral shrinkage estimators. We also study a federated approach where multiple data centers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.