Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement
Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Niclas Goring, Ouns El Harzli, Abdurrahman Hadi Erturk, Soufiane Hayou, and Ard A. Louis

TL;DR
This paper introduces a new, efficient metric for measuring the richness of neural dynamics independently of performance, revealing insights into training factors and providing interpretability tools.
Contribution
It proposes a stability-based richness metric grounded in low-rank bias, independent of accuracy, and introduces visualization methods for analyzing training dynamics.
Findings
The metric is more stable than existing measures.
Training factors like batch normalization influence dynamics richness.
The approach uncovers relationships between training parameters, dynamics, and representations.
Abstract
Dynamic feature transformation (the rich regime) does not always align with predictive performance (better representation), yet accuracy is often used as a proxy for richness, limiting analysis of their relationship. We propose a computationally efficient, performance-independent metric of richness grounded in the low-rank bias of rich dynamics, which recovers neural collapse as a special case. The metric is empirically more stable than existing alternatives and captures known lazy-torich transitions (e.g., grokking) without relying on accuracy. We further use it to examine how training factors (e.g., learning rate) relate to richness, confirming recognized assumptions and highlighting new observations (e.g., batch normalization promotes rich dynamics). An eigendecomposition-based visualization is also introduced to support interpretability, together providing a diagnostic tool for…
Peer Reviews
Decision·ICLR 2026 Poster
I think that this paper is a strong paper, which proposes a new metric for dynamic richness and demonstrates how it works better than other examples from recent literature. S1: The *low-rank metric* is simple to understand and inexpensive to compute. The authors give good intuition on how it works, why it works, and bounds on computation complexity. S2: Theoretical foundations for the *low-rank metric* look solid. Although the authors make some assumptions to make calculations tractable, formu
I have not found major weaknesses in this paper. The following points are either minor or nitpicks. W1: I feel like the authors should find a clearer name and/or an acronym to designate the *low-rank metric*, which is often designated as “the metric” or as its mathematical notation \$\mathcal{D}_{LR}\$ throughout the paper. Such a name/acronym would make it easier to designate and reference in text, discussions and future work that will rely on it.
- Important problem: Decoupling dynamical richness from representation quality addresses a fundamental issue in understanding neural network training. The observation that rich dynamics ≠ better generalization (Figure 1) is compelling. - Computationally efficient: The O(p²C) complexity is a significant improvement over NTK-based methods, making the metric practical for modern architectures. - Strong theoretical grounding: The connection to neural collapse (Propositions 1 and 2) provides solid th
- Limited scope: The current formulation only applies to orthogonal and isotropic target functions. While this covers many classification tasks, the restriction is significant. The authors acknowledge this but don't provide a clear path to generalization. - Empirical validation: While the authors demonstrate DLR's utility across diverse settings (grokking, learning rate variations, weight decay, batch norm) the experiments conducted are of small scale and somewhat artificial. It would be interes
1. Originality. Recasts richness as low-rank alignment between features and the learned function via a principled MP-operator; this is a fresh angle relative to NTK-deviation or label-based collapse metrics. The reduction to neural collapse provides a clean conceptual bridge. 2. Quality. The metric is simple, normalized, and computationally cheap, enabling use on standard vision models. Comparative experiments are thoughtful and reveal expected patterns. 3. Clarity. The paper is well structured
1. DLR inspects only the final-layer features; rich dynamics might manifest earlier and be attenuated by a constrained head. Consider a hierarchical variant (layer-wise DLR or block-DLR) and show whether conclusions persist will be helpful. 2. The technical background is not much sufficient for readers to capture the essence. More straightforward interpretation and introduction are helpful. 3. The MP-operator and some guarantees rely on orthogonal/isotropic targets and supervised, one-hot settin
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsLinear Layer
