The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan, Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

TL;DR
This paper introduces BitNet b1.58, a 1.58-bit ternary LLM that matches full-precision models in performance while significantly reducing costs, and establishes new scaling laws and hardware opportunities for 1-bit LLMs.
Contribution
The paper presents a novel 1.58-bit ternary LLM, BitNet b1.58, demonstrating high performance and cost efficiency, and proposes new training scaling laws and hardware design directions.
Findings
BitNet b1.58 matches full-precision LLM performance.
1.58-bit models are more cost-effective in latency, memory, and energy.
Defines new scaling laws for high-performance, low-bit LLMs.
Abstract
Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗1bitLLM/bitnet_b1_58-largemodel· 1.3k dl· ♡ 1201.3k dl♡ 120
- 🤗1bitLLM/bitnet_b1_58-3Bmodel· 1.4k dl· ♡ 2621.4k dl♡ 262
- 🤗HF1BitLLM/Llama3-8B-1.58-Linear-10B-tokensmodel· 29 dl· ♡ 1129 dl♡ 11
- 🤗HF1BitLLM/Llama3-8B-1.58-Sigmoid-k100-10B-tokensmodel· 18 dl· ♡ 1018 dl♡ 10
- 🤗HF1BitLLM/Llama3-8B-1.58-100B-tokensmodel· 2.6k dl· ♡ 2082.6k dl♡ 208
- 🤗mobiuslabsgmbh/Llama-2-7b-chat-hf_1bitgs8_hqqmodel· 10 dl· ♡ 7410 dl♡ 74
- 🤗1bitLLM/bitnet_b1_58-xlmodel· 264 dl· ♡ 39264 dl♡ 39
- 🤗NousResearch/OLMo-Bitnet-1Bmodel· 171 dl· ♡ 120171 dl♡ 120
- 🤗budecosystem/boomer-bitnet-634mmodel· 9 dl· ♡ 19 dl♡ 1
- 🤗abideen/Bitnet-Llama-70Mmodel· 18 dl· ♡ 2818 dl♡ 28
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsLinear Layer · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Softmax · Layer Normalization · Multi-Head Attention
