Towards optimal hierarchical training of neural networks

Michael Feischl; Alexander Rieder; Fabian Zehetgruber

arXiv:2407.02242·math.NA·October 31, 2024

Towards optimal hierarchical training of neural networks

Michael Feischl, Alexander Rieder, Fabian Zehetgruber

PDF

Open Access

TL;DR

This paper introduces a hierarchical training algorithm for neural networks that adaptively expands the architecture to escape local minima, achieving optimal convergence rates and providing new tools for assessing training and generalization.

Contribution

The paper presents a novel hierarchical training method that adaptively extends neural networks during training, with theoretical guarantees on convergence and new metrics for training optimality.

Findings

01

Algorithm provably escapes local minima.

02

Achieves optimal convergence rate based on parameters.

03

Provides indicators for training optimality and generalization.

Abstract

We propose a hierarchical training algorithm for standard feed-forward neural networks that adaptively extends the network architecture as soon as the optimization reaches a stationary point. By solving small (low-dimensional) optimization problems, the extended network provably escapes any local minimum or stationary point. Under some assumptions on the approximability of the data with stable neural networks, we show that the algorithm achieves an optimal convergence rate s in the sense that loss is bounded by the number of parameters to the -s. As a byproduct, we obtain computable indicators which judge the optimality of the training state of a given network and derive a new notion of generalization error.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications