Lizard: An Efficient Linearization Framework for Large Language Models

Chien Van Nguyen; Huy Nguyen; Ruiyi Zhang; Hanieh Deilamsalehy; Puneet Mathur; Viet Dac Lai; Haoliang Wang; Jayakumar Subramanian; Ryan A. Rossi; Trung Bui; Nikos Vlassis; Franck Dernoncourt; Thien Huu Nguyen

arXiv:2507.09025·cs.CL·April 21, 2026

Lizard: An Efficient Linearization Framework for Large Language Models

Chien Van Nguyen, Huy Nguyen, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Viet Dac Lai, Haoliang Wang, Jayakumar Subramanian, Ryan A. Rossi, Trung Bui, Nikos Vlassis, Franck Dernoncourt, Thien Huu Nguyen

PDF

TL;DR

Lizard is a novel linearization framework that transforms large language models into more efficient, subquadratic architectures with adaptive memory control, maintaining high performance on benchmarks.

Contribution

Lizard introduces a subquadratic attention mechanism with learnable modules and a hardware-aware algorithm to improve efficiency and robustness of large language models.

Findings

01

Achieves near-lossless performance recovery of teacher models.

02

Outperforms previous methods by up to 24.5 points on MMLU benchmark.

03

Demonstrates superior associative recall capabilities.

Abstract

We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due to the quadratic complexity of softmax attention and the growing Key-Value (KV) cache that makes inference memory-bound by context length. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving model quality. Unlike prior linearization methods constrained by fixed, non-adaptive structures, Lizard augments the architecture with compact, learnable modules that enable adaptive memory control and robust length generalization. Moreover, we introduce a hardwareaware algorithm that solves numerical instability in gated attention to accelerate training. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.