Unveiling the Mystery of Weight in Large Foundation Models: Gaussian Distribution Never Fades
Chongjie Si, Jingjing Jiang, Wei Shen

TL;DR
This paper investigates the weight distributions in large foundation models, revealing they follow Gaussian patterns and that transformation weights help adapt models by increasing weight variability, which aids in downstream tasks.
Contribution
It uncovers the Gaussian nature of LFM weights, their relationship with Gaussian noise, and how transformation weights facilitate model adaptation, providing foundational insights.
Findings
Weights follow Gaussian distribution regardless of initialization
Transformation weights increase weight standard deviation with depth
Effective in LFM adaptation and editing tasks
Abstract
This paper presents a pioneering exploration of the mechanisms underlying large foundation models' (LFMs) weights, aiming to simplify AI research. Through extensive observation and analysis on prevailing LFMs, we find that regardless of initialization strategies, their weights predominantly follow a Gaussian distribution, with occasional sharp, inverted T-shaped, or linear patterns. We further discover that the weights share the i.i.d. properties of Gaussian noise, and explore their direct relationship. We find that transformation weights can be derived from Gaussian noise, and they primarily serve to increase the standard deviation of pre-trained weights, with their standard deviation growing with layer depth. In other words, transformation weights broaden the acceptable deviation from the optimal weights, facilitating adaptation to downstream tasks. Building upon the above…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGroundwater flow and contamination studies
