TL;DR
This paper proves that transformers can universally approximate functions under constraints, ensuring outputs satisfy specific sets, and extends classical theorems to constrained and manifold-valued functions.
Contribution
It introduces a constrained universal approximation theorem for transformers and a deep neural version of Berge's Maximum Theorem, enabling constrained optimization.
Findings
Transformers can exactly encode constraints while approximating functions.
Universal approximation theorem now applies to convex and non-convex constraint sets.
Results include approximation for Riemannian manifold-valued functions with geodesic convexity.
Abstract
Many practical problems need the output of a machine learning model to satisfy a set of constraints, . Nevertheless, there is no known guarantee that classical neural network architectures can exactly encode constraints while simultaneously achieving universality. We provide a quantitative constrained universal approximation theorem which guarantees that for any non-convex compact set and any continuous function , there is a probabilistic transformer whose randomized outputs all lie in and whose expected output uniformly approximates . Our second main result is a "deep neural version" of Berge's Maximum Theorem (1963). The result guarantees that given an objective function , a constraint set , and a family of soft constraint sets, there is a probabilistic transformer that approximately minimizes and whose outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
