Robust and Resource Efficient Identification of Shallow Neural Networks   by Fewest Samples

Massimo Fornasier; Jan Vyb\'iral; Ingrid Daubechies

arXiv:1804.01592·stat.ML·May 7, 2021

Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples

Massimo Fornasier, Jan Vyb\'iral, Ingrid Daubechies

PDF

TL;DR

This paper presents a method for efficiently identifying the structure of shallow neural networks using minimal samples by leveraging higher order derivatives and tensor decompositions.

Contribution

It introduces a novel approach combining active/passive sampling and tensor methods for neural network structure identification with fewer samples.

Findings

01

Effective identification of neural network weights using second order derivatives.

02

Active and passive sampling schemes enable accurate approximation with fewer samples.

03

Stable algorithms successfully recover weights close to orthonormal vectors.

Abstract

We address the structure identification and the uniform approximation of sums of ridge functions $f (x) = \sum_{i = 1}^{m} g_{i} (a_{i} \cdot x)$ on $R^{d}$ , representing a general form of a shallow feed-forward neural network, from a small number of query samples. Higher order differentiation, as used in our constructive approximations, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, second order differentiation and tensors of order two (i.e., matrices) suffice as we prove in this paper. We use two sampling schemes to perform approximate differentiation - active sampling, where the sampling points are universal, actively, and randomly designed, and passive sampling, where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.