The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units
Oswaldo Ludwig

TL;DR
This paper introduces the condition number as a scale-invariant measure of information encoding in neural units, linking it to entropy and geometric properties, and demonstrates its practical application in fine-tuning large language models to prevent catastrophic forgetting.
Contribution
It formalizes the relationship between the condition number and information encoding in neural networks, providing a theoretical basis and a practical method for selective fine-tuning.
Findings
High condition number correlates with reduced information transfer.
The proposed KappaTune method mitigates catastrophic forgetting effectively.
The approach does not require access to pre-training statistics.
Abstract
This paper explores the relationship between the condition number of a neural network's weight tensor and the extent of information encoded by the associated processing unit, viewed through the lens of information theory. It argues that a high condition number, though not sufficient for effective knowledge encoding, may indicate that the unit has learned to selectively amplify and compress information. This intuition is formalized for linear units with Gaussian inputs, linking the condition number and the transformation's log-volume scaling factor to the characteristics of the output entropy and the geometric properties of the learned transformation. The analysis demonstrates that for a fixed weight norm, a concentrated distribution of singular values (high condition number) corresponds to reduced overall information transfer, indicating a specialized and efficient encoding strategy.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
