TransNormerLLM: A Faster and Better Large Language Model with Improved   TransNormer

Zhen Qin; Dong Li; Weigao Sun; Weixuan Sun; Xuyang Shen; Xiaodong Han,; Yunshen Wei; Baohong Lv; Xiao Luo; Yu Qiao; Yiran Zhong

arXiv:2307.14995·cs.CL·January 22, 2024·6 cites

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han,, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

PDF

Open Access 2 Repos 8 Models

TL;DR

TransNormerLLM introduces a linear attention-based large language model that surpasses traditional models in accuracy and efficiency through innovative modifications like Lightning Attention and tensor normalization.

Contribution

It is the first linear attention LLM outperforming softmax attention models, with advanced techniques for acceleration, stability, and large-scale deployment.

Findings

01

Outperforms state-of-the-art LLMs in speed and accuracy

02

Achieves over 20% acceleration with tensor normalization

03

Supports deployment of models up to 175B parameters

Abstract

We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization. Specifically, we use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, we propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, we leverage a gating mechanism for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Residual Connection · Dropout · Linear Layer · Multi-Head Attention · Byte Pair Encoding