On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural   Networks with Linear Activations

Arthur Castello B. de Oliveira; Milad Siami; Eduardo D. Sontag

arXiv:2305.09904·cs.LG·May 18, 2023·2 cites

On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations

Arthur Castello B. de Oliveira, Milad Siami, Eduardo D. Sontag

PDF

Open Access

TL;DR

This paper analyzes how overparameterization affects the robustness of gradient flow in single hidden-layer linear neural networks, revealing conditions for stability and the presence of spurious equilibria.

Contribution

It provides the first analysis of the ISS property for gradient flow in overparameterized linear neural networks with one-dimensional input and output.

Findings

01

Derived sufficient conditions for robustness based on convergence criteria.

02

Identified spurious equilibria outside the loss minimization set.

03

Discussed potential extensions to more general neural network architectures.

Abstract

Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to classical statistical belief. This phenomenon, sometimes known as ``benign overfitting'', raises questions regarding in what other ways might overparameterization affect the properties of a learning problem. In this work, we investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation. This uncertainty arises naturally if the gradient is estimated from noisy data or directly measured. Our object of study is a linear neural network with a single, arbitrarily wide, hidden layer and an arbitrary number of inputs and outputs. In this paper we solve the problem for the case…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms