Topology and Geometry of Half-Rectified Network Optimization
C. Daniel Freeman, Joan Bruna

TL;DR
This paper investigates the loss surface topology of deep half-rectified networks, revealing how data distribution and over-parameterization influence local minima and the connectedness of level sets, with empirical evidence suggesting near convexity during training.
Contribution
The work provides a theoretical analysis of the loss landscape for deep half-rectified networks without simplifying assumptions, highlighting the role of data smoothness and model size in landscape topology.
Findings
Deep linear networks have different topologies than half-rectified ones.
Level sets of the loss landscape remain connected during training.
Curvature of level sets increases as energy decreases, matching practical observations.
Abstract
The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of strongly simplifying the nonlinear nature of the model. In this work, we do not make any such assumption and study conditions on the data distribution and model architecture that prevent the existence of bad local minima. Our theoretical work quantifies and formalizes two important \emph{folklore} facts: (i) the landscape of deep linear networks has a radically different topology from that of deep half-rectified ones, and (ii) that the energy landscape in the non-linear case is fundamentally controlled by the interplay between the smoothness of the data distribution and model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Topological and Geometric Data Analysis · Markov Chains and Monte Carlo Methods
