Task-Driven Kernel Flows: Label Rank Compression and Laplacian Spectral Filtering
Hongxi Li, Chunlin Huang

TL;DR
This paper develops a theoretical framework for feature learning in wide L2-regularized networks, showing that supervised learning inherently compresses features and operates within a low-rank, task-relevant subspace.
Contribution
It introduces a kernel ODE model predicting spectral evolution and proves kernel rank bounds related to the number of classes, unifying deterministic and stochastic perspectives.
Findings
Kernel rank is bounded by the number of classes ($C$).
SGD noise exhibits low-rank structure ($O(C)$).
Supervised learning produces low-rank, task-specific representations.
Abstract
We present a theory of feature learning in wide L2-regularized networks showing that supervised learning is inherently compressive. We derive a kernel ODE that predicts a "water-filling" spectral evolution and prove that for any stable steady state, the kernel rank is bounded by the number of classes (). We further demonstrate that SGD noise is similarly low-rank (), confining dynamics to the task-relevant subspace. This framework unifies the deterministic and stochastic views of alignment and contrasts the low-rank nature of supervised learning with the high-rank, expansive representations of self-supervision.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing · Domain Adaptation and Few-Shot Learning
