A Convergence Theory Towards Practical Over-parameterized Deep Neural   Networks

Asaf Noy; Yi Xu; Yonathan Aflalo; Lihi Zelnik-Manor; Rong Jin

arXiv:2101.04243·cs.LG·February 9, 2021·1 cites

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks

Asaf Noy, Yi Xu, Yonathan Aflalo, Lihi Zelnik-Manor, Rong Jin

PDF

Open Access

TL;DR

This paper advances the theoretical understanding of over-parameterized deep neural networks by establishing convergence guarantees with network widths quadratic in sample size and linear in depth, using a novel surrogate network approach.

Contribution

It significantly improves bounds on network width and convergence time, bridging the gap between theory and practical neural network training.

Findings

01

Convergence guaranteed for networks with width quadratic in sample size.

02

Introduces a surrogate network technique for analyzing training dynamics.

03

Bounds are logarithmic in network size and depth.

Abstract

Deep neural networks' remarkable ability to correctly fit training data when optimized by gradient-based algorithms is yet to be fully understood. Recent theoretical results explain the convergence for ReLU networks that are wider than those used in practice by orders of magnitude. In this work, we take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with widths quadratic in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size. This construction can be viewed as a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia?