Binary Neural Networks for Large Language Model: A Survey
Liangdong Liu, Zhitong Zheng, Cong Wang, Tianhuang Su, and Zhenyu Yang

TL;DR
This survey reviews binary quantization techniques for large language models, highlighting how low-bit binary weights can significantly reduce resource requirements while maintaining performance.
Contribution
It provides a comprehensive overview of binary quantization methods for LLMs, detailing their implementations, contributions, and applications.
Findings
Binary quantization reduces memory and computational costs.
Various binary quantization techniques have been developed for LLMs.
Binary methods enable efficient deployment of large models.
Abstract
Large language models (LLMs) have wide applications in the field of natural language processing(NLP), such as GPT-4 and Llama. However, with the exponential growth of model parameter sizes, LLMs bring significant resource overheads. Low-bit quantization, as a key technique, reduces memory usage and computational demands by decreasing the bit-width of model parameters, activations, and gradients. Previous quantization methods for LLMs have largely employed Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). PTQ does not require any retraining of the original model, while QAT involves optimizing precision during training to achieve the best quantization parameters. The BitNet team proposed a radically different approach, where quantization is performed from the start of model training, utilizing low-precision binary weights during the training process. This approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
