Roughness Index for Loss Landscapes of Neural Network Models of Partial Differential Equations
Keke Wu, Xiangru Jian, Rui Du, Jingrun Chen, Xiang Zhou

TL;DR
This paper introduces a novel roughness index (RI) to analyze the loss landscapes of neural networks solving PDEs, revealing landscape characteristics and optimization challenges.
Contribution
The paper proposes a new, easy-to-compute roughness index for high-dimensional loss landscapes, applied to neural networks solving PDEs, providing insights into landscape complexity.
Findings
Deep Galerkin method landscapes are less rough than deep Ritz method.
RI increases then decreases along gradient descent paths.
Landscape roughness correlates with optimization difficulty.
Abstract
Loss landscape is a useful tool to characterize and compare neural network models. The main challenge for analysis of loss landscape for the deep neural networks is that they are generally highly non-convex in very high dimensional space. In this paper, we develop "the roughness"concept for understanding such landscapes in high dimensions and apply this technique to study two neural network models arising from solving differential equations. Our main innovation is the proposal of a well-defined and easy-to-compute roughness index (RI) which is based on the mean and variance of the (normalized) total variation for one-dimensional functions projected on randomly sampled directions. A large RI at the local minimizer hints an oscillatory landscape profile and indicates a severe challenge for the first-order optimization method. Particularly, we observe the increasing-then-decreasing pattern…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Machine Learning in Materials Science · Stochastic Gradient Optimization Techniques
