Approximation Theory for Lipschitz Continuous Transformers

Takashi Furuya; Davide Murari; Carola-Bibiane Sch\"onlieb

arXiv:2602.15503·cs.LG·February 18, 2026

Approximation Theory for Lipschitz Continuous Transformers

Takashi Furuya, Davide Murari, Carola-Bibiane Sch\"onlieb

PDF

Open Access

TL;DR

This paper introduces a class of Lipschitz-continuous Transformers with provable approximation guarantees, ensuring stability and robustness in safety-critical applications by modeling Transformers as operators on probability measures.

Contribution

It develops a novel theoretical framework for Lipschitz-constrained Transformers, including a universal approximation theorem and a measure-theoretic analysis.

Findings

01

Lipschitz-continuous Transformers can approximate functions within a Lipschitz space.

02

The measure-theoretic approach yields token-count independent approximation guarantees.

03

The proposed architecture ensures inherent stability without losing expressivity.

Abstract

Stability and robustness are critical for deploying Transformers in safety-sensitive settings. A principled way to enforce such behavior is to constrain the model's Lipschitz constant. However, approximation-theoretic guarantees for architectures that explicitly preserve Lipschitz continuity have yet to be established. In this work, we bridge this gap by introducing a class of gradient-descent-type in-context Transformers that are Lipschitz-continuous by construction. We realize both MLP and attention blocks as explicit Euler steps of negative gradient flows, ensuring inherent stability without sacrificing expressivity. We prove a universal approximation theorem for this class within a Lipschitz-constrained function space. Crucially, our analysis adopts a measure-theoretic formalism, interpreting Transformers as operators on probability measures, to yield approximation guarantees…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Neural Networks and Reservoir Computing