A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions
Yulong Lu, Jianfeng Lu

TL;DR
This paper proves that deep neural networks with ReLU activations can approximate target probability distributions from a source distribution using various integral probability metrics, with network size bounds depending on the metric.
Contribution
It establishes a universal approximation theorem for deep neural networks representing probability distributions, with size bounds depending on the chosen discrepancy measure.
Findings
Neural networks can approximate distributions arbitrarily closely under Wasserstein, MMD, and KSD.
Network size grows exponentially with dimension for Wasserstein distance.
Network size depends polynomially on dimension for MMD and KSD.
Abstract
This paper studies the universal approximation property of deep neural networks for representing probability distributions. Given a target distribution and a source distribution both defined on , we prove under some assumptions that there exists a deep neural network with ReLU activation such that the push-forward measure of under the map is arbitrarily close to the target measure . The closeness are measured by three classes of integral probability metrics between probability distributions: -Wasserstein distance, maximum mean distance (MMD) and kernelized Stein discrepancy (KSD). We prove upper bounds for the size (width and depth) of the deep neural network in terms of the dimension and the approximation error with respect to the three discrepancies. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia?
