A Universal Approximation Theorem of Deep Neural Networks for Expressing   Probability Distributions

Yulong Lu; Jianfeng Lu

arXiv:2004.08867·cs.LG·November 17, 2020·34 cites

A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions

Yulong Lu, Jianfeng Lu

PDF

Open Access 1 Video

TL;DR

This paper proves that deep neural networks with ReLU activations can approximate target probability distributions from a source distribution using various integral probability metrics, with network size bounds depending on the metric.

Contribution

It establishes a universal approximation theorem for deep neural networks representing probability distributions, with size bounds depending on the chosen discrepancy measure.

Findings

01

Neural networks can approximate distributions arbitrarily closely under Wasserstein, MMD, and KSD.

02

Network size grows exponentially with dimension for Wasserstein distance.

03

Network size depends polynomially on dimension for MMD and KSD.

Abstract

This paper studies the universal approximation property of deep neural networks for representing probability distributions. Given a target distribution $π$ and a source distribution $p_{z}$ both defined on $R^{d}$ , we prove under some assumptions that there exists a deep neural network $g : R^{d} \to R$ with ReLU activation such that the push-forward measure $(\nabla g)_{#} p_{z}$ of $p_{z}$ under the map $\nabla g$ is arbitrarily close to the target measure $π$ . The closeness are measured by three classes of integral probability metrics between probability distributions: $1$ -Wasserstein distance, maximum mean distance (MMD) and kernelized Stein discrepancy (KSD). We prove upper bounds for the size (width and depth) of the deep neural network in terms of the dimension $d$ and the approximation error $ε$ with respect to the three discrepancies. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia?