TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han,, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

TL;DR
TransNormerLLM introduces a linear attention-based large language model that surpasses traditional models in accuracy and efficiency through innovative modifications like Lightning Attention and tensor normalization.
Contribution
It is the first linear attention LLM outperforming softmax attention models, with advanced techniques for acceleration, stability, and large-scale deployment.
Findings
Outperforms state-of-the-art LLMs in speed and accuracy
Achieves over 20% acceleration with tensor normalization
Supports deployment of models up to 175B parameters
Abstract
We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization. Specifically, we use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, we propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, we leverage a gating mechanism for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗OpenNLPLab/TransNormerLLM-385Mmodel· 755 dl· ♡ 10755 dl♡ 10
- 🤗OpenNLPLab/TransNormerLLM-1Bmodel· 47 dl· ♡ 1347 dl♡ 13
- 🤗OpenNLPLab/TransNormerLLM-7Bmodel· 10 dl· ♡ 1810 dl♡ 18
- 🤗TheBloke/TransNormerLLM-7B-GPTQmodel· 7 dl· ♡ 57 dl♡ 5
- 🤗OpenNLPLab/TransNormerLLM2-7B-300Bmodel· 12 dl· ♡ 412 dl♡ 4
- 🤗OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpointsmodel· 17 dl· ♡ 1517 dl♡ 15
- 🤗OpenNLPLab/TransNormerLLM2-3B-300Bmodel· 8 dl· ♡ 38 dl♡ 3
- 🤗OpenNLPLab/TransNormerLLM2-1B-300Bmodel· 10 dl· ♡ 310 dl♡ 3
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Residual Connection · Dropout · Linear Layer · Multi-Head Attention · Byte Pair Encoding
