Smaller generalization error derived for a deep residual neural network   compared to shallow networks

Aku Kammonen; Jonas Kiessling; Petr Plech\'a\v{c}; Mattias Sandberg,; Anders Szepessy; Ra\'ul Tempone

arXiv:2010.01887·math.NA·April 15, 2021

Smaller generalization error derived for a deep residual neural network compared to shallow networks

Aku Kammonen, Jonas Kiessling, Petr Plech\'a\v{c}, Mattias Sandberg,, Anders Szepessy, Ra\'ul Tempone

PDF

Open Access

TL;DR

This paper derives a smaller generalization error bound for deep residual neural networks using optimal random Fourier feature distributions, outperforming shallow networks and leading to a new training method with promising experimental results.

Contribution

It introduces an optimal frequency distribution for deep residual networks' random Fourier features, reducing generalization error compared to shallow networks and informing a novel training algorithm.

Findings

01

Smaller generalization error bound for deep residual networks.

02

Optimal frequency distribution derived for random Fourier features.

03

New training method demonstrated promising performance.

Abstract

Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $\overset{z}{ˉ}_{ℓ + 1} = \overset{z}{ˉ}_{ℓ} + Re \sum_{k = 1}^{K} \overset{ˉ}{b}_{ℓ k} e^{i ω_{ℓ k} \overset{z}{ˉ}_{ℓ}} + Re \sum_{k = 1}^{K} \overset{c}{ˉ}_{ℓ k} e^{i ω_{ℓ k}^{'} \cdot x}$ . An optimal distribution for the frequencies $(ω_{ℓ k}, ω_{ℓ k}^{'})$ of the random Fourier features $e^{i ω_{ℓ k} \overset{z}{ˉ}_{ℓ}}$ and $e^{i ω_{ℓ k}^{'} \cdot x}$ is derived. This derivation is based on the corresponding generalization error for the approximation of the function values $f (x)$ . The generalization error turns out to be smaller than the estimate $∥ \hat{f} ∥_{L^{1} (R^{d})}^{2} / (K L)$ of the generalization error for random Fourier features with one hidden layer and the same total number of nodes $K L$ , in the case the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeophysical Methods and Applications · Numerical methods in engineering · Non-Destructive Testing Techniques