Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models
Radu Lecoiu, Debarghya Mukherjee, Pragya Sur

TL;DR
This paper establishes that s-step self-distillation is statistically optimal among spectral shrinkage estimators for spiked covariance models, explaining its effectiveness in improving model performance.
Contribution
It provides the first rigorous statistical analysis of self-distillation, showing its optimality among spectral shrinkage estimators and connecting it with classical shrinkage methods.
Findings
s-step self-distillation achieves optimal performance among spectral shrinkage estimators
Any fewer than s steps results in suboptimal estimators
Optimal Ridge regression outperforms other spectral shrinkage estimators in isotropic cases
Abstract
Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing a broad class of estimators, namely spectral shrinkage estimators. We establish that for spiked covariance matrices with spikes, -step self-distillation achieves optimal performance among spectral shrinkage estimators, outperforming well-known estimators in statistics and machine learning. Moreover, we show that steps are necessary for optimality: any -step distilled estimator is strictly suboptimal for . For the special subclass of isotropic covariances, we show that optimally tuned Ridge regression performs best among spectral shrinkage estimators. We also study a federated approach where multiple data centers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
