Transformer Approximations from ReLUs

Jerry Yao-Chieh Hu; Mingcheng Lu; Yi-Chen Lee; Han Liu

arXiv:2604.24878·cs.LG·April 29, 2026

Transformer Approximations from ReLUs

Jerry Yao-Chieh Hu, Mingcheng Lu, Yi-Chen Lee, Han Liu

PDF

TL;DR

This paper introduces a systematic method to translate ReLU approximation results into bounds for the softmax attention mechanism, enhancing analytical tools for transformer models.

Contribution

It provides a recipe for deriving resource-efficient approximations of softmax attention from ReLU approximation results, applicable to various primitives.

Findings

01

Provides a systematic recipe for translation from ReLU to softmax attention.

02

Yields target-specific resource bounds beyond universal approximation.

03

Demonstrates the approach on multiplication, reciprocal, and min/max primitives.

Abstract

We provide a systematic recipe for translating ReLU approximation results to softmax attention mechanism. This recipe covers many common approximation targets. Importantly, it yields target-specific, economic resource bounds beyond universal approximation statements. We showcase the recipe on multiplication, reciprocal computation, and min/max primitives. These results provide new analytical tools for analyzing softmax transformer models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.