Integrated electro-optic attention nonlinearities for transformers
Luis Mickeler, Kai Lion, Alfonso Nardi, Jost Kellner, Pierre Didier, Bhavin J. Shastri, Niao He, Rachel Grange

TL;DR
This paper introduces the use of thin-film lithium niobate modulators as analog nonlinear units to replace digital Softmax in transformers, significantly reducing inference latency while maintaining accuracy.
Contribution
It demonstrates a novel hardware approach using electro-optic modulators for nonlinear functions in transformers, improving speed and energy efficiency.
Findings
Electro-optic modulators can replace digital Softmax with minimal accuracy loss.
The system maintains accuracy under 4-bit quantization.
Noise characterization shows robustness at high encoding speeds.
Abstract
Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision. At the core of these models lies the attention mechanism, which requires a nonlinear, non-negative mapping using the Softmax function. However, although Softmax operations account for less than 1% of the total operation count, they can disproportionately bottleneck overall inference latency. Here, we use thin-film lithium niobate (TFLN) Mach-Zehnder modulators (MZMs) as analog nonlinear computational elements to drastically reduce the latency of nonlinear computations. We implement electro-optic alternatives to digital Softmax and Sigmoid, and evaluate their performance in Vision Transformers and Large Language Models. Our system maintains highly competitive accuracy, even under aggressive 4-bit input-output quantization of the analog…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
