# Better Approximations of High Dimensional Smooth Functions by Deep   Neural Networks with Rectified Power Units

**Authors:** Bo Li, Shanshan Tang, Haijun Yu

arXiv: 1903.05858 · 2020-02-28

## TL;DR

This paper demonstrates that deep neural networks with rectified power units (RePU) can approximate smooth functions more efficiently than ReLU networks, requiring smaller network sizes and offering better stability and approximation properties.

## Contribution

The paper introduces a novel approach using RePU activations for better approximation of smooth functions, with constructive algorithms and theoretical analysis showing improved efficiency over ReLU networks.

## Key findings

- RePU networks require $	ext{O}(	ext{log}(1/\varepsilon))$ smaller sizes than ReLU networks for the same accuracy.
- RePU networks are numerically more stable and use fewer activation functions than classical methods.
- RePU networks naturally fit smooth functions involving derivatives, enhancing their application in derivative-based loss functions.

## Abstract

Deep neural networks with rectified linear units (ReLU) are getting more and more popular due to their universal representation power and successful applications. Some theoretical progress regarding the approximation power of deep ReLU network for functions in Sobolev space and Korobov space have recently been made by [D. Yarotsky, Neural Network, 94:103-114, 2017] and [H. Montanelli and Q. Du, SIAM J Math. Data Sci., 1:78-92, 2019], etc. In this paper, we show that deep networks with rectified power units (RePU) can give better approximations for smooth functions than deep ReLU networks. Our analysis bases on classical polynomial approximation theory and some efficient algorithms proposed in this paper to convert polynomials into deep RePU networks of optimal size with no approximation error. Comparing to the results on ReLU networks, the sizes of RePU networks required to approximate functions in Sobolev space and Korobov space with an error tolerance $\varepsilon$, by our constructive proofs, are in general $\mathcal{O}(\log\frac{1}{\varepsilon})$ times smaller than the sizes of corresponding ReLU networks constructed in most of the existing literature. Comparing to the classical results of Mhaskar [Mhaskar, Adv. Comput. Math. 1:61-80, 1993], our constructions use less number of activation functions and numerically more stable, they can be served as good initials of deep RePU networks and further trained to break the limit of linear approximation theory. The functions represented by RePU networks are smooth functions, so they naturally fit in the places where derivatives are involved in the loss function.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.05858/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1903.05858/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/1903.05858/full.md

---
Source: https://tomesphere.com/paper/1903.05858