Shallow Univariate ReLu Networks as Splines: Initialization, Loss   Surface, Hessian, & Gradient Flow Dynamics

Justin Sahs; Ryan Pyle; Aneel Damaraju; Josue Ortega Caro; Onur; Tavaslioglu; Andy Lu; Ankit Patel

arXiv:2008.01772·cs.LG·August 6, 2020·1 cites

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics

Justin Sahs, Ryan Pyle, Aneel Damaraju, Josue Ortega Caro, Onur, Tavaslioglu, Andy Lu, Ankit Patel

PDF

Open Access

TL;DR

This paper reinterprets shallow univariate ReLU neural networks as splines to analyze their learning dynamics, loss surface, and implicit regularization, providing new insights and a transparent framework for understanding neural network behavior.

Contribution

It introduces a spline-based reparametrization of ReLU networks, offering a simple, intuitive analysis of their loss surface, initialization effects, and implicit regularization mechanisms.

Findings

01

Flat functions result from standard initializations and overparametrization.

02

Initialization scale influences implicit regularization via the loss landscape.

03

Spline perspective reproduces and clarifies recent kernel-based regularization results.

Abstract

Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. We propose reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with recent work arXiv:1906.05827. Our implicit regularization results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis

Methods*Communicated@Fast*How Do I Communicate to Expedia?