Stochastic Gradient Descent and Anomaly of Variance-flatness Relation in Artificial Neural Networks
Xia Xiong, Yong-Cong Chen, Chunxiao Shi, Ping Ao

TL;DR
This paper investigates the anomaly in the variance-flatness relation in neural networks trained with SGD, revealing that the true energy landscape differs from the cost function and resolving the paradox through a statistical physics approach.
Contribution
It introduces a dynamic decomposition method to analyze SGD near fixed points, uncovering the true energy function that explains the anomaly and bridging statistical physics with AI.
Findings
The true energy function differs from the cost function.
The anomaly is resolved by identifying the Boltzmann distribution.
The approach bridges statistical physics principles with neural network training.
Abstract
Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks has attracted continuing studies for the theoretical principles behind its success. A recent work reports an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng & Tu, PNAS 118, 0027 (2021)]. To investigate this seemingly violation of statistical physics principle, the properties of SGD near fixed points are analysed via a dynamic decomposition method. Our approach recovers the true "energy" function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the the anomaly. The study bridges the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithms to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsDense Connections · Feedforward Network · Stochastic Gradient Descent · Progressive Neural Architecture Search
