Complex fractal trainability boundary can arise from trivial non-convexity
Yizhou Liu

TL;DR
This paper demonstrates that simple non-convex perturbations to quadratic functions can create fractal boundaries in neural network trainability, influenced by perturbation roughness and other factors, affecting hyperparameter tuning.
Contribution
It reveals that trivial non-convex modifications can produce fractal trainability boundaries, providing insights into the loss landscape's complexity during neural network training.
Findings
Fractal boundaries can emerge from simple cosine perturbations.
Roughness of perturbation controls fractal dimension of trainability boundaries.
Transition from non-fractal to fractal boundaries occurs as roughness increases.
Abstract
Training neural networks involves optimizing parameters to minimize a loss function, where the nature of the loss function and the optimization strategy are crucial for effective training. Hyperparameter choices, such as the learning rate in gradient descent (GD), significantly affect the success and speed of convergence. Recent studies indicate that the boundary between bounded and divergent hyperparameters can be fractal, complicating reliable hyperparameter selection. However, the nature of this fractal boundary and methods to avoid it remain unclear. In this study, we focus on GD to investigate the loss landscape properties that might lead to fractal trainability boundaries. We discovered that fractal boundaries can emerge from simple non-convex perturbations, i.e., adding or multiplying cosine type perturbations to quadratic functions. The observed fractal dimensions are influenced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Theoretical and Computational Physics · Statistical Mechanics and Entropy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
