# Best k-layer neural network approximations

**Authors:** Lek-Heng Lim, Mateusz Michalek, Yang Qi

arXiv: 1907.01507 · 2019-12-13

## TL;DR

This paper investigates the existence of optimal solutions in the empirical risk minimization problem for k-layer neural networks, revealing nonexistence in general and characterizing specific cases where solutions do or do not exist.

## Contribution

It proves that ERM solutions often do not exist for 2-layer neural networks with common activations and provides a geometric characterization of the space of two-layer networks.

## Key findings

- ERM solutions do not exist in general for 2-layer networks with ReLU, tanh, sigmoid.
- For smooth activations, nonexistence occurs on positive-measured response sets.
- Complete classification of cases where ERM attains its infimum for ReLU networks.

## Abstract

We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set $s_1, \dots, s_n \in \mathbb{R}^p$ with corresponding responses $t_1,\dots,t_n \in \mathbb{R}^q$, fitting a $k$-layer neural network $\nu_\theta : \mathbb{R}^p \to \mathbb{R}^q$ involves estimation of the weights $\theta \in \mathbb{R}^m$ via an ERM: \[ \inf_{\theta \in \mathbb{R}^m} \; \sum_{i=1}^n \lVert t_i - \nu_\theta(s_i) \rVert_2^2. \] We show that even for $k = 2$, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. A high-level explanation is like that for the nonexistence of best rank-$r$ approximations of higher-order tensors --- the set of parameters is not a closed set --- but the geometry involved for best $k$-layer neural networks approximations is more subtle. In addition, we show that for smooth activations $\sigma(x)= 1/\bigl(1 + \exp(-x)\bigr)$ and $\sigma(x)=\tanh(x)$, such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation $\sigma(x)=\max(0,x)$, we completely classifying cases where the ERM for a best two-layer neural network approximation attains its infimum. As an aside, we obtain a precise description of the geometry of the space of two-layer neural networks with $d$ neurons in the hidden layer: it is the join locus of a line and the $d$-secant locus of a cone.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.01507/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/1907.01507/full.md

---
Source: https://tomesphere.com/paper/1907.01507