Approximation and Gradient Descent Training with Neural Networks

G. Welper

arXiv:2405.11696·cs.LG·May 21, 2024

Approximation and Gradient Descent Training with Neural Networks

G. Welper

PDF

Open Access

TL;DR

This paper bridges the gap between neural network approximation capabilities and practical training methods by extending theoretical results from gradient flow to gradient descent, especially in over- and under-parametrized regimes.

Contribution

It provides new theoretical guarantees for neural networks trained by gradient descent, aligning approximation theory with practical training scenarios.

Findings

01

Gradient descent achieves similar approximation bounds as gradient flow.

02

Over-parametrization guarantees zero training error.

03

Theoretical results apply to practical training methods.

Abstract

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training error, these two theories are not immediately compatible. Recent work uses the smoothness that is required for approximation results to extend a neural tangent kernel (NTK) optimization argument to an under-parametrized regime and show direct approximation bounds for networks trained by gradient flow. Since gradient flow is only an idealization of a practical method, this paper establishes analogous results for networks trained by gradient descent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications