Fast and Stable Triangular Inversion for Delta-Rule Linear Transformers

Aleksandros Sobczyk; Gioele Gottardo; Christos K. Matzoros; Mirko De Vita; Filip Skogh; Anastasios Zouzias; Jiawei Zhuang

arXiv:2605.21325·cs.LG·May 21, 2026

Fast and Stable Triangular Inversion for Delta-Rule Linear Transformers

Aleksandros Sobczyk, Gioele Gottardo, Christos K. Matzoros, Mirko De Vita, Filip Skogh, Anastasios Zouzias, Jiawei Zhuang

PDF

TL;DR

This paper analyzes and improves the efficiency and stability of triangular matrix inversion in linear attention models, enabling faster and more accurate long-context transformers.

Contribution

It systematically evaluates direct and iterative inversion algorithms, optimizing for hardware efficiency and numerical stability in linear transformers.

Findings

01

Up to 4.3× speed-up over state-of-the-art methods

02

Maintains full model accuracy with improved performance

03

Effective in low-precision floating-point scenarios

Abstract

Linear attention has emerged as a cornerstone for efficient long-context architectures, as evidenced by its integration into state-of-the-art open-source models including Qwen3.5/3.6, Kimi Linear, and RWKV-7. Models that incorporate linear attention layers with the so-called Delta-Rule involve the inversion of triangular matrices as a core sub-routine. This operation often forms a performance bottleneck, and, due to its high-sensitivity to numerical errors, it can significantly deteriorate end-to-end model accuracy if it is not carefully implemented. This work provides a systematic analysis of both direct and iterative triangular inversion algorithms, targeting methods that are rich in matrix products, and, therefore, have the potential to efficiently utilize modern hardware. To that end, our analysis covers a broad spectrum of mathematical and practical aspects, with a heavy focus on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.