The loss surface of deep linear networks viewed through the algebraic   geometry lens

Dhagash Mehta; Tianran Chen; Tingting Tang; Jonathan D. Hauenstein

arXiv:1810.07716·stat.ML·October 19, 2018

The loss surface of deep linear networks viewed through the algebraic geometry lens

Dhagash Mehta, Tianran Chen, Tingting Tang, Jonathan D. Hauenstein

PDF

TL;DR

This paper uses algebraic geometry to analyze the optimization landscapes of deep linear networks, revealing how regularization affects flat minima and stationary points, and providing computational insights into their loss surfaces.

Contribution

It introduces a geometric perspective to characterize flat minima, establishes bounds on stationary points, and computationally identifies all stationary points for certain deep linear networks.

Findings

01

Regularization removes geometrically flat minima.

02

Deep linear networks have local minima distinct from global minima.

03

All stationary points can be computed for modest network sizes.

Abstract

By using the viewpoint of modern computational algebraic geometry, we explore properties of the optimization landscapes of the deep linear neural network models. After clarifying on the various definitions of "flat" minima, we show that the geometrically flat minima, which are merely artifacts of residual continuous symmetries of the deep linear networks, can be straightforwardly removed by a generalized $L_{2}$ regularization. Then, we establish upper bounds on the number of isolated stationary points of these networks with the help of algebraic geometry. Using these upper bounds and utilizing a numerical algebraic geometry method, we find all stationary points of modest depth and matrix size. We show that in the presence of the non-zero regularization, deep linear networks indeed possess local minima which are not the global minima. Our computational results clarify certain aspects of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.