# Depth Separations in Neural Networks: What is Actually Being Separated?

**Authors:** Itay Safran, Ronen Eldan, Ohad Shamir

arXiv: 1904.06984 · 2021-06-03

## TL;DR

This paper investigates depth separation in neural networks for Lipschitz radial functions, showing that depth 2 networks can approximate such functions efficiently, challenging previous assumptions about the necessity of greater depth.

## Contribution

The paper demonstrates that depth 2 networks can approximate -Lipschitz radial functions with polynomial size in dimension and inverse accuracy, contradicting prior depth separation results.

## Key findings

- Depth 2 networks can approximate -Lipschitz radial functions with polynomial size in dimension.
- Approximation is also possible with size polynomial in 1/psilon for fixed dimension.
- Simultaneous polynomial dependence on both dimension and inverse accuracy is impossible.

## Abstract

Existing depth separation results for constant-depth networks essentially show that certain radial functions in $\mathbb{R}^d$, which can be easily approximated with depth $3$ networks, cannot be approximated by depth $2$ networks, even up to constant accuracy, unless their size is exponential in $d$. However, the functions used to demonstrate this are rapidly oscillating, with a Lipschitz parameter scaling polynomially with the dimension $d$ (or equivalently, by scaling the function, the hardness result applies to $\mathcal{O}(1)$-Lipschitz functions only when the target accuracy $\epsilon$ is at most $\text{poly}(1/d)$). In this paper, we study whether such depth separations might still hold in the natural setting of $\mathcal{O}(1)$-Lipschitz radial functions, when $\epsilon$ does not scale with $d$. Perhaps surprisingly, we show that the answer is negative: In contrast to the intuition suggested by previous work, it \emph{is} possible to approximate $\mathcal{O}(1)$-Lipschitz radial functions with depth $2$, size $\text{poly}(d)$ networks, for every constant $\epsilon$. We complement it by showing that approximating such functions is also possible with depth $2$, size $\text{poly}(1/\epsilon)$ networks, for every constant $d$. Finally, we show that it is not possible to have polynomial dependence in both $d,1/\epsilon$ simultaneously. Overall, our results indicate that in order to show depth separations for expressing $\mathcal{O}(1)$-Lipschitz functions with constant accuracy -- if at all possible -- one would need fundamentally different techniques than existing ones in the literature.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.06984/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1904.06984/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1904.06984/full.md

---
Source: https://tomesphere.com/paper/1904.06984