Unveiling the Dynamics of Information Interplay in Supervised Learning
Kun Song, Zhiquan Tan, Bochao Zou, Huimin Ma, Weiran Huang

TL;DR
This paper introduces matrix mutual information ratio (MIR) and matrix entropy difference ratio (HDR) as new analytical tools to understand and optimize the information dynamics in supervised learning, revealing insights into neural network behavior and training phenomena.
Contribution
The paper proposes MIR and HDR metrics based on matrix information theory, providing a novel framework to analyze and improve supervised learning processes and phenomena like Neural Collapse and grokking.
Findings
MIR and HDR effectively explain neural network training dynamics.
Using MIR and HDR as loss terms improves training outcomes.
Insights into phenomena like Neural Collapse and grokking are gained.
Abstract
In this paper, we use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process. Specifically, inspired by the theory of Neural Collapse, we introduce matrix mutual information ratio (MIR) and matrix entropy difference ratio (HDR) to assess the interactions of data representation and class classification heads in supervised learning, and we determine the theoretical optimal values for MIR and HDR when Neural Collapse happens. Our experiments show that MIR and HDR can effectively explain many phenomena occurring in neural networks, for example, the standard supervised training dynamics, linear mode connectivity, and the performance of label smoothing and pruning. Additionally, we use MIR and HDR to gain insights into the dynamics of grokking, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics
MethodsLabel Smoothing
