The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shuming Ma; Hongyu Wang; Lingxiao Ma; Lei Wang; Wenhui Wang; Shaohan; Huang; Li Dong; Ruiping Wang; Jilong Xue; Furu Wei

arXiv:2402.17764·cs.CL·February 28, 2024·37 cites

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan, Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

PDF

Open Access 3 Repos 10 Models

TL;DR

This paper introduces BitNet b1.58, a 1.58-bit ternary LLM that matches full-precision models in performance while significantly reducing costs, and establishes new scaling laws and hardware opportunities for 1-bit LLMs.

Contribution

The paper presents a novel 1.58-bit ternary LLM, BitNet b1.58, demonstrating high performance and cost efficiency, and proposes new training scaling laws and hardware design directions.

Findings

01

BitNet b1.58 matches full-precision LLM performance.

02

1.58-bit models are more cost-effective in latency, memory, and energy.

03

Defines new scaling laws for high-performance, low-bit LLMs.

Abstract

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsLinear Layer · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Softmax · Layer Normalization · Multi-Head Attention