FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive   Distillation

Liqun Ma; Mingjie Sun; Zhiqiang Shen

arXiv:2407.07093·cs.CL·July 10, 2024

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Liqun Ma, Mingjie Sun, Zhiqiang Shen

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper introduces FBI-LLM, a fully binarized large language model trained from scratch that matches the performance of full-precision models, using autoregressive distillation and revealing new training insights.

Contribution

It demonstrates the first successful training of large-scale fully binarized LLMs from scratch, matching full-precision performance without pretrained weights.

Findings

01

Binarized LLMs can be trained from scratch without pretrained weights.

02

FBI-LLM achieves competitive perplexity and task performance.

03

The training trajectory analysis provides new insights into binarized model training.

Abstract

This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary LLM like BitNet b1.58) to match the performance of its full-precision counterparts (e.g., FP16 or BF16) in transformer-based LLMs. It achieves this by employing an autoregressive distillation (AD) loss with maintaining equivalent model dimensions (130M, 1.3B, 7B) and training data volume as regular LLM pretraining, while delivering competitive results in terms of perplexity and task-specific effectiveness. Intriguingly, by analyzing the training trajectory, we find that the pretrained weight is not necessary for training binarized LLMs from scratch. This research encourages a new computational framework and may facilitate the future design of specialized hardware tailored for fully 1-bit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liqunma/fbi-llm
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Digital Rights Management and Security · Digital and Traditional Archives Management