Spectral Signatures of Data Quality: Eigenvalue Tail Index as a Diagnostic for Label Noise in Neural Networks
Matthew Loftus

TL;DR
This paper demonstrates that the eigenvalue tail index of neural network weight matrices at the bottleneck layer effectively detects label noise and data quality issues, outperforming traditional metrics in controlled experiments.
Contribution
It introduces the eigenvalue tail index as a diagnostic tool for data quality and label noise detection in neural networks, connecting spectral properties to data degradation.
Findings
Eigenvalue tail index predicts test accuracy under label noise with high R^2 (0.984).
Spectral measures are weak predictors under hyperparameter variation, with simple baselines performing better.
The tail index detects real human annotation errors in CIFAR-10N effectively.
Abstract
We investigate whether spectral properties of neural network weight matrices can predict test accuracy. Under controlled label noise variation, the tail index alpha of the eigenvalue distribution at the network's bottleneck layer predicts test accuracy with leave-one-out R^2 = 0.984 (21 noise levels, 3 seeds per level), far exceeding all baselines: the best conventional metric (Frobenius norm of the optimal layer) achieves LOO R^2 = 0.149. This relationship holds across three architectures (MLP, CNN, ResNet-18) and two datasets (MNIST, CIFAR-10). However, under hyperparameter variation at fixed data quality (180 configurations varying width, depth, learning rate, and weight decay), all spectral and conventional measures are weak predictors (R^2 < 0.25), with simple baselines (global L_2 norm, LOO R^2 = 0.219) slightly outperforming spectral measures (tail alpha, LOO R^2 = 0.167). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
