Integrated electro-optic attention nonlinearities for transformers

Luis Mickeler; Kai Lion; Alfonso Nardi; Jost Kellner; Pierre Didier; Bhavin J. Shastri; Niao He; Rachel Grange

arXiv:2604.09512·cs.LG·April 13, 2026

Integrated electro-optic attention nonlinearities for transformers

Luis Mickeler, Kai Lion, Alfonso Nardi, Jost Kellner, Pierre Didier, Bhavin J. Shastri, Niao He, Rachel Grange

PDF

TL;DR

This paper introduces the use of thin-film lithium niobate modulators as analog nonlinear units to replace digital Softmax in transformers, significantly reducing inference latency while maintaining accuracy.

Contribution

It demonstrates a novel hardware approach using electro-optic modulators for nonlinear functions in transformers, improving speed and energy efficiency.

Findings

01

Electro-optic modulators can replace digital Softmax with minimal accuracy loss.

02

The system maintains accuracy under 4-bit quantization.

03

Noise characterization shows robustness at high encoding speeds.

Abstract

Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision. At the core of these models lies the attention mechanism, which requires a nonlinear, non-negative mapping using the Softmax function. However, although Softmax operations account for less than 1% of the total operation count, they can disproportionately bottleneck overall inference latency. Here, we use thin-film lithium niobate (TFLN) Mach-Zehnder modulators (MZMs) as analog nonlinear computational elements to drastically reduce the latency of nonlinear computations. We implement electro-optic alternatives to digital Softmax and Sigmoid, and evaluate their performance in Vision Transformers and Large Language Models. Our system maintains highly competitive accuracy, even under aggressive 4-bit input-output quantization of the analog…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.