FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
Liqun Ma, Mingjie Sun, Zhiqiang Shen

TL;DR
This paper introduces FBI-LLM, a fully binarized large language model trained from scratch that matches the performance of full-precision models, using autoregressive distillation and revealing new training insights.
Contribution
It demonstrates the first successful training of large-scale fully binarized LLMs from scratch, matching full-precision performance without pretrained weights.
Findings
Binarized LLMs can be trained from scratch without pretrained weights.
FBI-LLM achieves competitive perplexity and task performance.
The training trajectory analysis provides new insights into binarized model training.
Abstract
This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary LLM like BitNet b1.58) to match the performance of its full-precision counterparts (e.g., FP16 or BF16) in transformer-based LLMs. It achieves this by employing an autoregressive distillation (AD) loss with maintaining equivalent model dimensions (130M, 1.3B, 7B) and training data volume as regular LLM pretraining, while delivering competitive results in terms of perplexity and task-specific effectiveness. Intriguingly, by analyzing the training trajectory, we find that the pretrained weight is not necessary for training binarized LLMs from scratch. This research encourages a new computational framework and may facilitate the future design of specialized hardware tailored for fully 1-bit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Digital Rights Management and Security · Digital and Traditional Archives Management
