Transformer Approximations from ReLUs
Jerry Yao-Chieh Hu, Mingcheng Lu, Yi-Chen Lee, Han Liu

TL;DR
This paper introduces a systematic method to translate ReLU approximation results into bounds for the softmax attention mechanism, enhancing analytical tools for transformer models.
Contribution
It provides a recipe for deriving resource-efficient approximations of softmax attention from ReLU approximation results, applicable to various primitives.
Findings
Provides a systematic recipe for translation from ReLU to softmax attention.
Yields target-specific resource bounds beyond universal approximation.
Demonstrates the approach on multiplication, reciprocal, and min/max primitives.
Abstract
We provide a systematic recipe for translating ReLU approximation results to softmax attention mechanism. This recipe covers many common approximation targets. Importantly, it yields target-specific, economic resource bounds beyond universal approximation statements. We showcase the recipe on multiplication, reciprocal computation, and min/max primitives. These results provide new analytical tools for analyzing softmax transformer models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
