TL;DR
This paper introduces a deep learning model called the probabilistic transformer (PT) that can universally approximate regular conditional distributions, effectively handling high-dimensional data and avoiding the curse of dimensionality.
Contribution
The paper presents a novel neural network architecture that approximates any continuous function from Euclidean space to Wasserstein space, with strategies to mitigate the curse of dimensionality.
Findings
PT can approximate any continuous function from $ ^d$ to $ P_1( ^D)$ uniformly on compact sets.
The model avoids the curse of dimensionality through two different approximation strategies.
The approach extends attention mechanisms to probabilistic settings for universal approximation.
Abstract
We introduce a deep learning model that can universally approximate regular conditional distributions (RCDs). The proposed model operates in three phases: first, it linearizes inputs from a given metric space to via a feature map, then a deep feedforward neural network processes these linearized features, and then the network's outputs are then transformed to the -Wasserstein space via a probabilistic extension of the attention mechanism of Bahdanau et al.\ (2014). Our model, called the \textit{probabilistic transformer (PT)}, can approximate any continuous function from to uniformly on compact sets, quantitatively. We identify two ways in which the PT avoids the curse of dimensionality when approximating -valued functions. The first strategy builds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
